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The SPARClite™ family is a collection of SPARC*-based microprocessors opti- 
mized for use in embedded systems. This manual describes the SPARClite archi- 
tecture, and discusses system-design issues. It is addressed to hardware system 
designers and to system and application programmers. Previous knowledge of 
the SPARC architecture is not assumed. 


Organization 


Together with the data sheets for each processor, this manual provides all the 
information necessary to use the SPARClite family in embedded-system designs. 
There are six chapters and three appendices: 


Overview— Describes the special features of the SPARClite family; introduces 
the SPARClite architecture; lists some of the development-support tools 
available for use in system design with family processors. 


Programmer’s Model—Describes the SPARClite processor as a collection of 
resources available to software. It discusses the processor’s modes of 
operation, the organization of memory, the register set, the supported data 
types and instructions, the on-chip caches, and interrupts and traps. 


Internal Architecture—Discusses the internal organization of the processor, 
and describes each major functional block—the SPARC Integer Unit, Data and 
Instruction Caches, Bus Interface Unit, and Debug Support Unit. 


External Interface—Describes the processor's input, output, and bidirectional 
signals, the operation of the bus, and the system-support functions 
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incorporated on-chip to minimize the amount of glue logic necessary in the 
external system. 


Programming Considerations—Tells programmers how to use certain 
processor resources—the register windows, for example—to best advantage. 


System Design Considerations—Describes how to interface the processor with 
external hardware; discusses the use of the MB86940 peripheral chip. 


Appendices—Provide a summary of the bits and fields in the control and 
status registers, a complete instruction-set reference, and an index of the 
instructions by operation code. 


Notation 


This manual uses the following notational conventions: 


Active-low signal names are preceded with a dash, as in -RESET. 


Numerals without any special prefix are in base 10. Hexadecimal numerals are 
preceded by Ox, and binary numerals are preceded by Ob. Thus, 28 = 0x1C = 
0b11100. 


Related Literature 


Additional information can be found in the following documents: 


MB86930 SPARClite 32-Bit RISC Microcontroller Data Sheet—Describes the 
MB86930 processor in detail, including complete physical, electrical, and 
timing characteristics. Available from Fujitsu Microelectronics’ Advanced 
Products Division. 


SPARClite Application Notes — Discuss specific design issues in detail. 
Available from Fujitsu Microelectronics’ Advanced Products Division. 





Overview 


The SPARClite family is a collection of SPARC-based microprocessors optimized 
for use in embedded systems. Processors in the SPARClite family conform to the 
SPARC architecture definition; in particular, they are fully compatible with exist- 
ing SPARC code and existing SPARC development environments. The MB86930 
processor is the first member of the SPARClite family. This chapter provides a 
quick introduction to the processor architecture. Subsequent chapters will review 
this material in more detail. 


1.1 General Description 


The MB86930 is a high-performance processor suitable for use in embedded con- 
trol applications such as printers, scanners, robotic machinery, telecom switches 
and monitors, and I/O subsystems. It operates at clock speeds up to 50 MHz, exe- 
cuting SPARC instructions at a maximum rate of 46 MIPs, and includes 2 Kbytes 
of instruction and 2 Kbytes of data cache on chip. It is available in a variety of 
packages, depending on clock-speed and power-dissipation requirements. 


The processor consists of a Harvard (Aiken) architecture Integer Unit (IU) core, 
instruction and data caches, a Bus Interface Unit (BIU), and an In-Circuit 
Emulator Unit (EMU). These units are connected internally over separate 
instruction and data buses, and to external memory and I/O over a unified 
(instruction and data) bus which carries 32 bits of address and 32 bits of data. 
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The register file in the IU implements 8 register windows. An integer multiply 
unit (MU) within the IU speeds applications which require integer multiplication. 
The processor uses software to emulate floating-point instructions at rates up to 1 
MFLOP. 


The internal instruction and data caches make it possible to sustain a processing 
rate close to one cycle per instruction by providing the IU at 50 MHz with a maxi- 
mum aggregate data throughput of 400 Mbytes/sec (two 32-bit words per cycle). 
The maximum external data throughput is 200 Mbytes/sec (1 word per cycle). In 
many applications, the internal caches make it possible to maintain high through- 
put even with slow external memory; SPARClite is therefore a cost-effective solu- 
tion in embedded control applications that require high processing throughput 
but cannot tolerate the cost of large, high-speed memories. 


The MB86930 is designed with Fujitsu’s AS technology, a 1p and 3-level metal 
process with minimum drawn transistor lengths of 0.8u. The design of the data 
path and other arrayed blocks is fully custom to optimize die area and speed. 
Random control blocks are based on standard cells. All circuits are fully static. 


While it does provide a mechanism for code and data protection, the MB86930 is 
optimized for embedded applications which do not require virtual-to-physical 
address translation. Using an MB86930 processor in a virtual-memory system, 
while possible, would require an external Memory Management Unit for address 
translation. 


1.2 Special Features 


This section lists some of the features which give the MB86930 its superior speed, 
flexibility and efficiency and make it an ideal choice for a wide variety of low cost, 
high-performance embedded systems. 


e Fast Instruction Execution: The instruction set is streamlined and hardwired 
for fast execution, with most instructions executing in a single cycle. At 50 
(40,30,20) MHz, the MB86930 executes instructions at a peak rate of 50 
(40,30,20) MIPs, and can sustain performance of 46 (37,28,18) MIPs. The 
Integer Unit (IU) features a 5-stage pipeline which has been designed to 
handle data interlocks, has an optimized branch handler for efficient control 
transfers, and a bus interface to handle single cycle bus accesses to on-chip 
cache. 


e Large Register Set: An internal register file consisting of 136 registers 
organized into eight overlapping windows speeds interrupt response time 
and context switches. The register file minimizes accesses to memory during 
procedure linkages and facilitates passing of parameters and assignment of 
variables, reducing code in many programs. Reduced code, in turn, can fit 
more easily into the instruction cache. 
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e On-Chip Caches: On-chip data and instruction caches decouple the processor 
from external memory latency. The caches are organized as two-way set- 
associative for improved hit rates, as compared with direct-mapped caches. 


¢ Cache Locking: Both data and instruction entries can be locked into their 
respective caches to ensure deterministic response and highest performance 
for critical or frequently recurring routines. Maximum flexibility has been 
designed into the cache to allow all or selected portions to be locked. 


e Separate Instruction and Data Paths On-Chip: Separate 32-bit instruction and 
data buses provide a high-bandwidth interface between the IU and on-chip 
cache. These buses support single cycle instruction execution as well as single 
cycle data transfers with the cache. The on-chip bus design also supports 
future expansion of the MB86930. 


e System Support Functions: The requirement for glue logic between the 
MB86930 and the system is minimized by providing programmable chip 
selects, programmable wait-state circuitry, and support for connection to fast 
page-mode DRAM. Multiple bus masters are supported through a simple 
handshake protocol. 


e Clock Generator: To simplify clock design, a crystal can be connected directly 
to the on-chip oscillator, or an external clock source can be used. A phase- 
locked loop minimizes the skew between on- and off-chip clocks. 


e Enhanced Instruction Set: The MB86930 incorporates a fast integer multiply 
instruction which executes in a fast 5, 3 or 2 cycles for 32-bit, 16-bit or 8-bit 
operands. An integer divide-step instruction cuts divide times by a factor of 
5 to 10 over previous SPARC implementations. A scan instruction supports a 
single-cycle search for the most significant non-sign bit in a word. 


e Fully Static Circuit Design: Its static design gives the MB86930 superior noise 
immunity. Future members of the SPARClite family will support a low-power 
mode, in which the processor clock can be slowed or stopped for arbitrary 
periods of time to reduce operating current with no loss of internal state. 

e Test and Debug Interface: The MB86930 supports production test through 
industry standard JTAG boundary scan. Hardware emulation is supported 
with on-chip breakpoint and single step logic. A dedicated emulator bus 
provides a means to trace transactions between the integer unit and on-chip 
cache. 


1.3 Programmer's Model 


This section briefly introduces those aspects of the SPARClite processor architec- 
ture which are visible to software: the user and supervisor modes of program 
execution; the organization of the address space; the processor’s register set, 
supported data types, and instruction set; the on-chip caches; and interrupts and 
traps. Each of the topics discussed here is developed more fully in subsequent 
chapters. 
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1.3.1 Program Modes 


The SPARClite architecture supports protection in multitasking environments by 
providing two mutually exclusive modes of program execution, user mode and 
supervisor mode. Certain instructions are privileged, and can only be executed 
when the processor is in supervisor mode. Any attempt to execute a privileged 
instruction in user mode causes a trap. 


Typically, application programs run in user mode, while operating systems run in 
supervisor mode. On reset, the processor is in supervisor mode. To enter user 
mode, software must clear a bit in the Processor State Register. The processor 
enters supervisor mode from user mode only when a hardware reset, an inter- 
rupt, or a trap occurs. 


1.3.2 Memory Organization 


The processor can directly address up to 1 Terabyte of memory, organized into 
256 address spaces of 4 GB each. Every external access involves an 8-bit Address 
Space Identifier (ASI), as well as a 32-bit address. The ASI selects one of the 
address spaces, and the 32-bit address selects a location within that space. 


The use of four of the address spaces are defined in the SPARC architecture: the 
User Instruction, Supervisor Instruction, User Data, and Supervisor Data spaces. 
SPARClite defines additional address spaces, which are used for memory- 
mapped control registers and for the data and instruction caches; two further 
address spaces are reserved for hardware debug. The remaining spaces are 
application-definable; any of them can be used for either data memory or I/O. 
All I/O is memory-mapped. The organization of the entire addressable range is 
illustrated in Figure 1-1. 
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FF FFFFFFFF 
Reserved for Hardware Debug 
FE 00000000 
Application-Definable (952 GB) 
10 00000000 
OF 00000000 ae a | Data Cache-Data (2 KB implemented) 
OE 00000000 Ce ee ae pale Pecie lag8\9 12 Ipemened) 
0D 00000000 Pe Instruction Cache-Data (2 KB implemented) 
Ne 00000000 Lee Instruction Cache-Tags (512 implemented) 
isor Data (4 GB)* 
0B 0000000 Supervisor Data (4 GB) 
User Data (4 GB)* 
OA 00000000 
i I i 4 GB)* 
09 00000000 Supervisor Instruction (4 GB) 
User Instruction (4 GB)* 
08 00000000 
Application-Definable (16 GB) 
we Data Cache-Tags (512 implemented) 
04 00000000 od 
a ee 
03 00000000 . 
were Instruction Cache-Tags (512 implemented) 
02 00000000 ipaend 
01 00000000 _ Control Registers (84 B) 
ee . : . 
00 00000000 ie. (See Fig. 1-2, Register Set) 
8-Bit 32-Bit Memory and I/O Space Memory-Mapped Registers 
Address Address (240 Addressabie Bytes) and On-Chip Cache 
Space 
indicator 
(ASI) * Note: Cacheable address spaces. 


Figure 1-1. Address-Space Organization 


Loads and stores are the only instructions that cause external accesses. Versions of 
these instructions exist for transferring bytes, half-words, words and double 
words between external memory (or I/O) and processor registers. In user mode, 
only the user instruction and data spaces are accessible; accessing any of the 
remaining 254 address spaces requires the processor to be in supervisor mode. 


The MB86930 processor does not contain memory-management hardware; vir- 
tual-address translation can be handled by software, or by an external memory- 
management unit with the on-chip caches disabled. 
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| 1.3.3 Registers 


All registers are 32 bits wide. There are general-purpose registers, whose contents 
have no pre-assigned meaning, and special-purpose registers, which contain control 
and status information or special data values. Some of the special-purpose regis- 
ters are defined in the SPARC architecture; the rest are SPARClite- or device- 
specific. The non-SPARC special-purpose registers are memory-mapped. The 
general-purpose registers, and the special-purpose Y Register, are the only ones 
which can be accessed in user mode. The register set is illustrated in Figure 1-2. 


SPARC-Defined Registers (Not Memory-Mapped) 


Processor State Register (PSR) 


Trap Base Register (TBR) 


* Not read/writable 














Memory-Mapped Control Registers 
(See Fig. 1-1, Address-Space Organization) 


Address Mask Registers (AMR <5:0>) 


128 Windowed Registers 
(See Fig. 1-3, Register Windows) 
















Wait-State Specifier Registers (WSSR <2:0>) 


Timer Register 
Timer Preload Register 
System Support Control Register 


General-Purpose Registers Special-Purpose Registers 





8 Global Registers 





Figure 1-2. Register Set 


General-Purpose Registers 


In the MB86930, there are 136 general-purpose registers; 8 of these are global regis- 
ters; the other 128 are divided into 8 overlapping blocks, or windows. Each 
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window contains 24 registers. Of these, 8 are local to the window, 8 are “out” reg- 
isters shared with the adjacent window below, and 8 are “in” registers shared 
with the adjacent window above. This organization is illustrated in Figure 1-3. 





Window 6 ! 
vi 


ir : 
dow > 
t ‘ 


Window, 4 


Window, 3 


Window, 0 


Figure 1-3. Register Windows 


At any given time, 32 general-purpose registers can be accessed directly: the 8 
global registers, and the 24 registers of the currently active window. The value in 
the Current Window Pointer (CWP) field of the Processor State Register (PSR) 
determines which window is active. 


The overlap between adjacent windows makes it easy to pass parameters to a 
subroutine. Values to be passed are written to the “out” registers of the current 
window, which are the same as the “in” registers of the adjacent window. A 
SAVE instruction can then be used to decrement the Current Window Pointer, 
making the parameter values available to the subroutine without moving any 
data. A RESTORE instruction can be used to increment the CWP upon return 
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from the subroutine. In effect, the general-purpose registers cache the top portion 
of the run-time stack. 


The window overlap also speeds interrupt handling, as interrupts automatically 
decrement the CWP, giving the interrupt routing its own window. The SPARC 
architecture requires a free window to be available to handle these traps. 


Special-Purpose Registers 


The special-purpose registers include the control and status registers defined by 
the SPARC architecture, plus a collection of memory-mapped registers which 
control peripheral functions. 


Special instructions exist for reading and writing each of the SPARC control and 
siatus registers, except for the Program Counter and the Next Program Counter. 
The Y Register can be read and written in user mode; the instructions that access 
the other SPARC-defined registers are privileged. 


The memory-mapped registers can be read and written with the alternate-space 
load and alternate-space store instructions, which are also privileged. 


The SPARC-defined registers, shown in Figure 1-2 above, are: 


e Processor State Register (PSR)—The primary processor control and status 
register. It contains mode fields, which are set by the operating system to 
configure the processor, and status fields, which are set by the processor to 
indicate the effects of instruction execution. 


e Window Invalid Mask Register (WIM)—Used by software to detect the 
occurrence of register file underflows and overflows. It contains one mask bit 
for each register window. If an operation which normally increments or 
decrements the Current Window Pointer would cause the CWP to point to a 
window whose corresponding WIM bit equals 1, a trap occurs. 


e Trap Base Register (TBR)—Contains three fields used by the processor to 
generate the address of the service routine when an interrupt or trap occurs. 


e Y Register—Used in stepwise multiplication and division routines based on 
the MULScc and DIVScc instructions. Also used for integer multiply 
operations. 


e Program Counter (PC)—Contains the word address of the instruction 
currently being executed by the Integer Unit. The PC cannot be directly read 
or written. 


e Next Program Counter (nPC)—Contains the word address of the next 
instruction to be executed, assuming that no trap occurs. The nPC cannot be 
directly read or written. 


e Ancillary State Registers (ASR[31:1])—The SPARC definition includes 31 
Ancillary State Registers, 15 of which (ASR[15:1]) are reserved for future use. 
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The remaining ASR’s can be defined and used in any way by SPARC 
implementations. SPARClite defines the following ASR: 


ASR17— Used to enable and disable single-vector trapping. (When this fea- 
ture is enabled, all traps vector to a single location.) Single vector trapping 
provides a small memory alternative to the standard 1K word trap table. 


The memory-mapped SPARClite-specific registers, shown in Figure 1-2, are: 


e Cache/Bus Interface Unit Control Register—Controls the operation of the data 
and instruction caches, and the write and prefetch buffers of the Bus Interface 
Unit. 


e Lock Control Register—Controls the locking of individual entries in the data 
and instruction caches. 


e Restore Lock Control Register—Enables or disables the restoration of the Lock 
Control Register upon return from an interrupt or a hardware trap. 


e Same-Page Mask Register—Controls the operation of the same-page detection 
logic by specifying which bits of the current ASI and address are to be 
compared with those of the previous ASI and address. 


e Address Range Specifier Registers (ARSR[5:1])—Control the assertion of the 
Chip-Select outputs (-CS[5:1]). -CSn is asserted when the value on the address 
bus falls in the address range specified by ARSRn. —CS0 is asserted on accesses 
to the lowest address range in Supervisor Instruction Space. 

e Address Mask Registers (AMR[5:0])—AMRn controls the comparison of the 
current address with ARSRn by specifying which bits are to be compared and 
which are “don’t cares.” 

e Wait-State Specifier Registers (WSSR[2:0])—Determine, for each address 
range, the number of clock cycles between the time an address in that range 
appears on the address bus and the time the processor automatically generates 
the -READY signal. This makes it possible for memory and I/O devices with 
different access times to be connected to the processor without additional 
logic. 

e Timer Register—Contains the current timer count. 


e Timer Pre-Load Register—Contains the value which is loaded into the timer 
when the timer overflows. 


e¢ System Support Control Register—Enables or disables same-page detection, 
chip-select, programmable wait-states, and the timer, independently of one 
another. 


1.3.4 Data Types 


SPARClite instructions support the Signed Integer, Unsigned Integer, and Tagged 
data formats of the SPARC definition. The Integer types are supported in byte 
(8-bit), half-word (16-bit), word (32-bit), and double-word (64-bit) widths. The 
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Tagged type is one word (32 bits) in width. Hardware support is not provided for 
the floating-point types; these can be handled in software. 


1.3.5 Instructions 


SPARClite provides an upward-compatible superset of the SPARC (version 8) 
instruction set. The additional instructions—integer divide-step, and scan for first 
changed bit — are supported for the sake of higher performance in embedded 
applications. Table 1-1 lists the SPARClite instruction set. In the MB86930 proces- 
sor, the floating-point and coprocessor instructions defined in the SPARC archi- 
tecture are trapped for software emulation. 


Each instruction is a single 32-bit word. The instruction set can be divided into 
five functional groups: 


1. Logical—Bit-wise boolean operations. Each logical instruction comes in two 
versions: one leaves the integer condition codes in the Processor State Register 
unchanged; the other changes the condition codes as a side effect. 


2. Arithmetic and Shift—Integer arithmetic, logical and arithmetic shifts. Besides 
the standard arithmetic operations, SPARC provides instructions to perform 
tagged arithmetic. In tagged arithmetic, the two least-significant bits of each 
operand are used to indicate the (user-defined) data type of the operand. The 
tagged arithmetic instructions set a condition code if the tag of an operand is 
not zero. 


Besides the arithmetic instructions defined in the SPARC architecture, 
SPARClite provides: 


e A divide-step instruction, which can be used to construct efficient iterative 
integer division algorithms. 


e A-scan instruction, which determines the first bit in a word which differs 
from the most-significant bit. The scan instruction can be used to simplify 
and accelerate many important operations, like normalizing numbers with 
redundant sign bits. 


Most of the arithmetic instructions come in two versions: one of them leaves 
the integer condition codes unchanged, while the other changes the condition 
codes as a side effect of execution. 


3. Control Transfer—Branches, calls, jumps, returns from trap, and conditional 
traps. The target address of the control transfer is computed either by adding a 
specified offset to the value in the Program Counter, or by adding two source 
operands. The transfer of control either occurs immediately after the control 
transfer instruction, or is delayed for one further instruction. 


4. Load and Store—External accesses. Load and store are the only instructions that 
read and write to external devices (including memory). Bytes, half-words, 
words and double words can be transferred to and from processor registers. 
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Special instructions access alternate address spaces. Attempts at unaligned 
accesses are trapped, and must be carried out under software control. 


5. Read and Write Control Registers—Access the Program State Register, Window- 
Invalid Mask Register, Trap-Base Register, Y Register, and Ancillary State 
Registers. There are also instructions for incrementing and decrementing the 
Current Window Pointer. With one exception, writes to the control registers 
are delayed for three instruction cycles. The three instructions following a 
write, therefore, should not attempt to use or modify the values written. A 
write to the Y Register, however, is not delayed: it is completed before the next 
instruction is executed. 


Table 1-1: Instruction Set 


Group [Opeode =i S~—N 


Logical | AND (ANDcc) 
ANDN (ANDNcc) 
OR (ORcc) 


ORN (ORNcc) 
XOR (XORcc) 
XNOR (XNORcc) 


Arithmetic | ADD (ADDcc) 
and Shift | ADDX (ADDXcc) 


And (and modify cc) 


And Not (and modify icc) 


Inclusive-Or (and modify icc) 
Inclusive-Or Not (and modify icc) 
Exclusive-Or (and modify icc) 
Exclusive-Nor (and modify icc) 


Add (and modify icc) 
Add with Carry (and modify icc) 





TADDcc (TADDccTV) Tagged Add 


SUB (SUBcc) 
SUBX (SUBXcc) 


Subtract (and modify icc) 
Subtract with Carry (and modify icc) 


TSUBcc (TSUBccTV) Tagged Subtract and modify icc (and Trap on overflow) 
MULScc Multiply Step and modify icc 





SLL 
SRL 
SRA 


Signed Multiply 

Unsigned Multiply 

Signed Multiply (and modify icc) 
Unsigned Multiply (and modify icc) 
Divide-Step (and Modify icc) 


Scan for bit different than MSB 


Shift Left Logical 
Shift Right Logical 
Shift Right Arithmetic 





Control Branch on integer condition codes 


Transfer CALL Gall 
JMPL Jump and Link | 
RETT Return from Trap 
Trap on integer condition codes 
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Table 1-1: 


Load 
and Store 


Read and 
Write 
Control 
Registers 


LDSB (LDSBA) 
LDSH (LDSHA) 
LDUB (LDUBA) 
LDUH (LDUHA 
LDD (LDDA) 


STB (STBA) 
STH (STHA) 
ST (STA) 

STD (STDA) 


LDSTUB (LDSTUBA) 
SWAP (SWAPA) 





Instruction Set (Continued) 


Load Signed Byte (from Alternate space) 

Load Signed Halfword (from Alternate space) 
Load Unsigned Byte (from Alternate space) 
Load Unsigned Halfword (from Alternate space) 
Load Doubleword (From Alternate space) 


Store Byte (into Alternate Space) 

Store Halfword (into Alternate space) 
Store Word (into Alternate space) 

Store Doubleword (into Alternate space) 


Atomic Load-Store Unsigned Byte (in Alternate space) 
Swap r Register with Memory (in Alternate space) 





Save caller's window 
Restore caller's window 


Read Y register 

Read processor State Register 
Read Window invalid Mask Register 
Read Trap Base Register 

Read Ancillary State Register 


Write Y register 

Write processor State Register 
Write Window invalid Mask Register 
Write Trap Base Register 

Write Ancillary State Register 


UNIMP Unimplemented instruction | 


1.3.6 Data and Instruction Caches 


Each member of the SPARClite family contains separate data and instruction 
caches on-chip. In the MB86930 processor, each cache is 2 Kbytes in size, orga- 
nized into two banks of sixty-four 4-word lines. Each cache line has a 22-bit 
address tag, which indicates the memory location to which the line is currently 
mapped. A cache line, together with its address tag and status bits, is often called 
a cache entry. The organization of each cache is two-way set associative; that is, each 
address in memory can be mapped to either of two locations in the cache. 


There are three modes of cache operation: normal, global locking, and local locking. 
In normal mode, when the integer unit requests a read to a data or instruction 
address which is not found in the appropriate cache, the memory block contain- 
ing the requested address is read into the cache, replacing one of the current cache 
entries. The locking modes prevent either an entire cache, or just selected entries, 
from being over written in this way. The locking modes thus allow time-critical 
routines to be locked into cache. Thanks to the set-associative organization, as 
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much as one whole bank of a cache can be locked while the remaining bank con- 
tinues to operate as a direct-mapped cache. 


In normal mode, the data cache uses a write-through update policy, and allocates 
a cache entry only on a load. Writes to locked data entries, however, are not writ- 
ten through to main memory. In this way, a portion of the data cache can be used 
as fast on-chip RAM which is not mapped to external memory. 


Cache tags and data are memory-mapped, and can be directly read and written 
using the alternate-space load and store instructions. These instructions are privi- 


leged. 


Subsequent chapters discuss the cache in greater detail: Programmer's Model dis- 
cusses cache locking; Programming Considerations contains hints for using the on- 
chip cache to best advantage. 


1.3.7 Interrupts and Traps 


In this manual, we distinguish between interrupts—which are initiated by exter- 
nal interrupt signals, asynchronously with respect to processor operations, and 
traps—which are caused by instructions, and so are necessarily synchronous. Dur- 
ing system operation, external interrupts are generally unavoidable; traps, how- 
ever, can and should be kept to a minimum by careful software design and 
testing. 


Interrupt response time is critical in many embedded applications. The total 
response time includes the time required for the processor to finish its current 
task after recognizing an interrupt, and the time required to switch contexts (if 
necessary) and begin executing the interrupt service routine. In the SPARClite 
family, non-interruptible multi-cycle events are minimized, (i.e., Cache refills 
which take multiple cycles to completely fill a cache line, are designed so they can 
be interrupted after every word load). This reduces both average and maximum 
interrupt latency. When an interrupt is detected, the processor switches to a new 
window. In this way, the current values in the general-purpose registers don’t 
have to be saved before interrupt service begins. Furthermore, service routines 
can be locked into the cache, making them available for immediate access. 


The MB86930 processor provides direct support for 15 distinct interrupt priority 
levels; each level can service multiple interrupt sources. Supervisor-mode soft- 
ware can mask up to 14 of these levels; the highest level is non-maskable (if 
ET=1). 


An interrupt or trap (other than reset) causes control to be transferred to an 
address generated by the Trap Base Register. One field in the TBR contains the 
base address of the trap dispatch table. Normally, an 8-bit trap type number serves 
as an Offset into this table. When single-vector trapping is enabled, however, control 
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Table 1-2: 


passes to the base address of the trap table (with tt=0), regardless of the trap type. 
Reset always traps to address 0. 


Up to 256 trap types can be distinguished on the basis of the 8-bit trap type num- 
ber. Of these, half are reserved for hardware interrupts and traps; all but one of 
the others are programmer-initiated (see the discussion of the Ticc instruction in 
the Programmer’s Model chapter). One trap type is defined in SPARClite to support 
in-circuit emulation. The various trap types are listed, in order of priority, in 
Table 1-2. 


Trap Types and Priorities 


pre | Prortty, | tt 


reset 
instruction access exceotion 
privileged_instruction 

illegal_ instruction 

fp_ disabled 

cp_disabled 

window_ overflow 
window_underflow 
mem_address_not_aligned 
data_access_ exception 
tag_overflow 

trap_instruction (Ticc) 
instruction_ breakpoint 
data_breakpoint 





































ONOAAKR WY = 





ONOOLRRwWND =H | 










interrupt_level_ 15 
interrupt_level_ 14 
interrupt_level_ 13 
interrupt_level_ 12 
interrupt_level_ 11 
interrupt_level_ 10 
interrupt_level_ 9 
interrupt_level_ 8 
interrupt_level_ 7 
interrupt_level_ 6 
interrupt_level_5 
interrupt_level_ 4 
interrupt_level_ 3 
interrupt_level 2 
interrupt_level_ 1 


The expression trapped instruction refers, in the case of a synchronous trap, to the 
instruction which caused it. In the case of an interrupt, the trapped instruction is 
the one which was about to execute when the interrupt occurred. 


The Integer Unit supports precise traps—when an interrupt or trap occurs, the 
saved state of the processor reflects the completion of all instructions prior to the 
trapped instruction, but no subsequent instructions (including the trapped 
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instruction). Hardware guarantees that upon return from the service routine, the 
Program Counter points to the trapped instruction or the following instruction if 
the trapped instruction was emulated. 


1.4 Internal Architecture 


The internal architecture of SPARClite family processors is illustrated in 

Figure 1-4. The processor core consists of an Integer Unit which implements a 
superset of the SPARC integer instruction set. Separate on-chip caches are pro- 
vided for data and instructions. The Bus Interface Unit handles the interface 
between the processor and the system. A Clock Generator with built-in phase- 
locked loop simplifies system clock design. Finally, the Debug Support Unit pro- 
vides hardware support for in-circuit emulation. Internally, the various functional 
units are connected by separate instruction and data buses. For connection with 
external memory and I/O, a unified address bus and a unified data bus are 
extended off-chip. The main functional units are discussed briefly below, and 
more fully in the Internal Architecture chapter. 


CLOCK 
GENERATOR 
CLK_OUT 






SPARC INTEGER UNIT 


DATA 















en 
Zz 
>» 
| 
BUS Oo 
ADDRESS Css | INTERFACE Saye 
5 BUS 
UNIT D 
asi¢ | = 
Ww 
Q 
CONTROL aa DRAM SUPPORT 
CHIP_SEL 16-BIT TIMER 


PAGE_DET ADDRESS 
REFRESH DECODE 





2K INSTRUCTION 2K DATA 
CACHE CACHE 


Figure 1-4. Internal Architecture (Block Diagram) 


1.4.1 Integer Unit 


The Integer Unit (IU) is a compact, fully custom implementation of the SPARC 
architecture. The IU is hard-wired for high performance. Its internal functional 
units are designed around a modular architecture and can be customized to meet 
different application requirements. In the MB86930, for example, this flexibility 
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was used to provide direct hardware support for integer multiplication, and to 
extend the SPARC instruction set by supporting divide-step and scan instruc- 
tions. 


The IU implements a five-stage instruction pipeline to allow a sustained execu- 
tion rate of nearly one instruction per cycle. The operation of the pipeline under 
ideal conditions is illustrated in Figure 1-5. The pipeline consists of the following 
stages: 


e Fetch (F)—One of the instruction memory spaces is addressed and returns an 
instruction. (Figure 1-5 below assumes a hit in the instruction cache.) 


e Decode (D)—The instruction is decoded; the register file is addressed and 
returns operands. 


e Execute (E)—The ALU computes a result. 


e Memory (M)—External memory is addressed (for load and store instructions 
only; this stage is idle for other instructions). 


e Writeback (W)—The result (or loaded memory datum) is written into the 
register file. 






CLK | 


Fetch Instruction 5 


Decode Instruction 4 
Execute | Instruction 3 
Memory | Instruction 2 
Write-Back | Instruction 14 


Figure 1-5. Instruction Pipeline 


No instructions execute out-of-order; that is, if instruction A enters the pipeline 
before instruction B, then instruction A necessarily reaches the writeback stage 
before instruction B does. Conditions which hold up the pipeline, and the effect of 
traps on pipeline operations, are discussed in the Internal Architecture chapter. 


1.4.2 Data and Instruction Caches 


The on-chip data and instruction caches allow designers to build high-perfor- 
mance systems without incurring the cost of fast external memory and the 
associated control logic. 


In the MB86930 processor, each cache is 2 Kbytes in size, organized into two 
banks of sixty-four 16-byte lines. Cache lines are refilled in 4-byte increments to 
avoid the interrupt latency incurred by long, uninterruptible cache line replace- 
ments. 
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The data and instruction caches are accessed independently over separate data 
and instruction buses, allowing data to be loaded from and stored to cache 
concurrently with instruction fetches. 





1.4.3 Bus Interface Unit 


The Bus Interface Unit (BIU) contains the logic which allows the processor to 
communicate with the system. The BIU receives requests for external memory 
and I/O accesses from the cache control logic. When the BIU performs a read, it 
returns the data to both the cache and the IU. Parallel paths make the data avail- 
able to the IU in the same cycle that it is written to the cache. 


The BIU has a one-word (32-bit) write buffer to hide external memory latency 
from the IU. The BIU also has a one-word prefetch buffer for instruction fetches. 
These buffers are enabled or disabled by bits in the Cache/Bus Interface Unit 
Control Register. 


1.4.4 Debug Support Unit 


The Debug Support Unit supports hardware emulation with on-chip breakpoint 
and single-step logic. A dedicated emulator bus is extended off-chip from the 
debug unit; the emulator bus makes it possible to trace transactions between the 
Integer Unit and on-chip cache. 


1.5 External Interface 


The processor’s external interface consists of signals, bus operations, and system 
support functions. This section gives an overview; details are discussed more 
fully in the External Interface chapter. The System Design Considerations chapter 
discusses issues that are likely to arise in the design of any SPARClite system. 


1.5.1 Signals 


The processor’s external signals, illustrated in Figure 1-6, can be grouped by 
function: 
e Processor Control and Status—Reset, error, and clock signals. 


e Memory Interface—Data and address buses, ASI and byte-enables, chip- 
selects, and other control signals used to access external memory and 
memory-mapped devices. 


e Bus Arbitration—Signals used by external devices in requesting, and by the 
processor in granting, control of the bus. 


e Peripheral Functions—Interrupt-requests and timer overflow. 
e Emulator Bus—Signals to support in-circuit emulation. 
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e Boundary-Scan—Test signals used for hardware verification. 


—~CLK_EXT 
CLKOUT1 

Processor CLKOUT2 
Control | CLKIN / XTAL1 


=) 
fem) 
& Status ae al ASI <7:0> 
a 
ee 


D <31:0> 


ADR <31:2> 


—ERROR 
—RESET —CS <5:0> 
Memory 
| 
Peripheral IRL <3:0> [-—__ —BE <3:0> menage 
Functions | _TIMER_OVF —MEXC 
MB86930 —READY 
Bus —BREQ VO SIGNALS RD/-WR 
Arbitration —BGRNT -LOCK 
m -AS 
TDO —SAME_PAGE j 
TCK 
Test Pins 
(Boundary Scan) TMS 
} TDI EMU_SD <3:0> 
-—TRST 
EMU_D <3:0> Emulator 
-EMU_BRK a 
—EMU_ENB 


Figure 1-6. Input and Output Signals 


1.5.2 Bus Operation 


At any given time, the Bus Interface Unit is handling requests for external mem- 
ory and I/O operations, arbitrating for bus access, or idle. From the point of view 
of the external system, bus transactions are handled in fairly standard ways: 


e Memory and I/O Operations—Read and write transactions are initiated with 
the BIU asserting the —AS signal. The RD/—WR output indicates the 
transaction type. The —-BE[3:0] outputs indicate the transaction width. The BIU 
drives the address and ASI signals, and either drives (on stores) or reads (on 
loads) the signals on the data bus. The transaction ends when the external 
system or programmable wait-state generator asserts -READY. 


An atomic load-store is executed as a load followed by a store, with no opera- 
tion allowed in between. The -LOCK output is asserted to indicate that the bus 
is being used for more than one consecutive memory operation. 


e Arbitration—Any external device can request ownership of the bus by 
asserting the -BREQ signal. The BIU three-states its bus drivers and asserts 
—-BGRNT to indicate that it is relinquishing control of the bus. On completion 
of its transaction, the external device de-asserts -BREQ; the BIU responds by 
de-asserting -BGRNT in the following cycle. 
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The External Interface chapter gives further details concerning bus operations, 
with timing diagrams, a bus state diagram, and a discussion of transactions that 
are interrupted by exceptions. 


1.5.3 System Support Functions 


Built-in system support functions help to minimize the amount of glue logic 
required in the external system. The support includes a set of system-configura- 
tion registers, a timer for generating refresh requests, and same-page detection 
logic. 


The system-configuration registers (Address Range Specifiers, Address Masks, 
and Programmable Wait-State Specifiers) allow software to define six different 
address ranges. When an address driven by the processor is in one of these 
ranges, the corresponding Chip-Select (-CS) pins are asserted. After a number of 
clock cycles determined by the corresponding Programmable Wait-State Speci- 
fier, the processor automatically generates the -READY signal. This makes it pos- 
sible for memory and I/O devices with different access times to be connected to 
the processor without additional logic. 


The programmable timer causes the -TIMER_OVF output signal to be asserted at 
software-defined intervals. This signal can be used to initiate DRAM refresh 
cycles, or to control other periodic events in the external system. 


The same-page detection logic determines whether the address of the current 
memory transaction is on the same page as the previous transaction. If it is, the 
processor asserts the -SAME_PAGE signal. The system can then take advantage 
of the fast consecutive accesses possible within the page boundaries of fast-page 
mode DRAM. 


1.6 Development-Support Tools 


A full range of development tools are available to support the development of 
your SPARClite application. The emergence of SPARC as the industry standard 
engineering workstation architecture provides a fully supported and cost effec- 
tive source of native development environments. Furthermore, tools targeted at 
embedded systems development are available as well. 


Solutions are available to meet your emulation, logic analysis, logic modeling, 
architectural simulation, real-time operating system, PC environment, bench- 
marking and prototyping requirements. Call the SPARClite customer hotline for a 
complete list of support solutions. 


Overview - Development-Support Tools 


1-19 


SPARClite User’s Guide 


Overview - Development-Support Tools 


1-20 





Programmer's Model 


This chapter presents the SPARClite processor architecture as a collection of 
resources available to software. It discusses the user and supervisor modes, the 
organization of the address space, the processor registers, the supported data 
types, the instruction set, the on-chip caches, interrupts and traps and debug sup- 
port. A separate section describes the internal state of the processor after reset. 


The Programming Considerations chapter contains information about how to use 
these processor resources to best advantage. 


2.1 Program Modes 


The SPARC architecture provides two mutually exclusive modes of program exe- 
cution, user mode and supervisor mode. The processor is in supervisor mode when 
the S bit of the Processor State Register (PSR) is 1, and in user mode when this bit 
is 0. Instructions which access either special-purpose registers or alternate mem- 
ory spaces are privileged; the use of privileged instructions is restricted to supervi- 
sor mode. 


The distinction between user and supervisor modes provides system protection 
in multitasking environments. System code runs in supervisor mode and has full 
access to processor resources, while application code runs in user mode and is 
kept from having unwanted side effects. Embedded systems connected to a net- 
work can use a protection scheme based on the distinction between user and 
supervisor modes. In such a scheme, network service routines intended to have 
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system-wide effects run in supervisor mode. Routines intended to have only local 
effects, on the other hand, run in user mode. 


In many embedded systems, however, this hierarchy is not required, and the pro- 
cessor can operate exclusively in supervisor mode. In this way, application code 
can directly manipulate the Current Window Pointer (in the PSR) and other pro- 
cessor control fields. 


On reset, the processor is in supervisor mode. To enter user mode, software must 
clear the S bit in the PSR. The processor enters supervisor mode from user mode 
only when a hardware reset, an interrupt, or a trap occurs. A return from trap 
(RETT) instruction restores the value the S bit had before the trap was taken. 


2.2 Memory Org nization 


Table 2-1: 


The processor can directly address up to 1 Terabyte of memory, organized into 
256 address spaces of 4 GB each. These address spaces may or may not overlap in 
physical memory, depending on the system design. Every external access 
involves an 8—-bit Address Space Identifier (ASI) as well as a 32-bit address. The 
ASI selects one of the address spaces, and the address selects a word within that 
space (see Table 2-1). Only the user instruction and data spaces are available in 
user mode; accessing any of the other 254 address spaces requires the processor to 
be in supervisor mode. 


ASI Address Space Map 


ASI <7:0> Address Space 


0x1 Control Register 
0x2 Instruction Cache Lock 
0x3 Data Cache Lock 
Ox4 - Ox7 Application Definable 
0x8 User Instruction Space 
0x9 Supervisor Instruction Space 


OxA User Data Space 
OxB Supervisor Data Space 
OxC Instruction Cache Tag RAM 
OxD Instruction Cache Data RAM 
OxE Data Cache Tag RAM 
OxF Data Cache Data RAM 
0x10 - OxFE Application Definable 
OxFF Reserved for Debug Hardware 





Loads and stores are the only instructions that cause external accesses. Versions of 
these instructions exist for transferring bytes, half-words, words and double 
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words between memory (or I/O) and processor registers. Addressing conven- 
tions for external accesses are “big-endian”: 


e Bytes—Increasing the address decreases the significance of a byte within the 
word. That is, the most significant byte of a word—the “big end” of the 
word—is accessed when bits [1:0] of the address are both 0. The least 
significant byte is accessed when address bits [1:0] are both 1. 





¢ Halfwords—The most significant halfword of a word is accessed when bit 1 of 
the address is 0, and the least significant halfword when address bit 1 is 1. 


e Doublewords—The most significant word of a doubleword is accessed when bit 
2 of the address is 0, and the least significant word when address bit 2 is 1. 


The address of a halfword, word, or doubleword is the address of its most signifi- 
cant byte. The addressing conventions are illustrated in Figure 2-1. 


Bytes 


address <1:0> 0 1 2 3 
Halfwords 
address <1:0> 0 2 
15 0115 0 
Word 

address <1:0> . 

0 31 0 

Doubleword 

address <2:0> 

0 63 32 


4 131 


Figure 2-1. Addressing Conventions 


Load and store operations require proper alignment of data in memory. An 
aligned doubleword address is divisible by 8, an aligned word address is divisi- 
ble by 4, and an aligned half-word address is divisible by 2. If a load or store 
instruction generates an improperly aligned address, a memory_address_not_ 
aligned trap occurs, and the access must be performed piecemeal under software 
control. 


The processor does not contain memory-management hardware; virtual-address 
translation can be handled by software, or by an external memory-management 
unit. 


2.3 Registers 


There are two types of registers: the general-purpose, or r registers, whose contents 
have no pre-assigned meaning, and the special-purpose registers, which contain 
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control and status information, or special-purpose data. All registers are 32 bits 
wide. The register set is illustrated in Figure 1-2 of the Overview chapter. 


The general-purpose (r) registers can be accessed in user mode. There are 136 r 
registers; 8 of them are global registers; the other 128 are divided into 8 overlapping 
blocks, called windows. The windowing system, and the special uses of certain r 
registers, are discussed below. 


The special-purpose registers are of two kinds: (1) registers defined by the SPARC 
architecture, and (2) memory-mapped registers which control peripheral func- 
tions. Special instructions exist for reading and writing each of the SPARC regis- 
ters, except for the Program Counter and the Next Program Counter. The 
memory-mapped registers can be read and written with the alternate-space load 
and store instructions. Except for reads and writes to the SPARC-defined Y regis- 
ter, aii of the instructions which access special-purpose registers are privileged. 


2.3.1 Register Windows 


As specified by the SPARC architecture, the general-purpose register set is orga- 
nized into a set of 8 global registers, plus a collection of overlapping windows. In 
the MB86930, there are 8 such windows. Each window contains 24 registers. Of 
these, 8 are local to the window, 8 are “out” registers shared with the adjacent win- 
dow below, and 8 are “in” registers shared with the adjacent window above. This 
organization is illustrated in Figure 2-2. 


At any given time, 32 general-purpose registers can be accessed directly: the 8 
global registers, and the 24 registers of the currently active window. The value in 
the Current Window Pointer (CWP) field of the Processor State Register (PSR) 
determines which window is active. (See Section 5.3 for register addressing con- 
ventions.) 
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Figure 2-2. Register Windows 


Register Addressing 


There are up to three address fields associated with a SPARC instruction. In the 
case of a three-address instruction, these are the rs1 field, the rs2 field, and the rd 
field. Rs1 and rs2 are the logical register addresses of the two source operands of the 
instruction while rd is the logical register address of the destination operand. 
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These addresses specify the location of the operands within the context of the cur- 
rent window, as shown in Table 2-2. 


Table 2-2: Logical Register Addressing 


r[O] - r[7] global[0] - global[7] 


r[8] - r[15] out[0] - out[7] 
r[16] - r[23] local[0] - local[7] 
r[24] - r[31] in[O] - in{7] 





The CWP field of the PSR register points to the current window. The combination 
of a logical register address with the CWP produces a physical register address. 
Physical register addresses are directly decoded by the Register File. Doubleword 
operands in the register file are assumed to have even-odd alignment. The even 
numbered register contains the most significant 32 bits of the doubleword. 
Instructions which act on doublewords must specify even-numbered register 
addresses. 


Since the CWP is part of the PSR register it is possible to change the value of the 
CWP with software. In particular, the WRPSR, SAVE, RESTORE, and RETT 
instructions can change the CWP. See the Instructions section below for details. 
Hardware also can change the CWP when a trap or interrupt occurs. See the Traps 
and Interrupts section. 


Performance Features 


The overlap between adjacent windows makes it easy to pass parameters to a 
subroutine. Values to be passed should be written to the “out” registers of the 
current window, which are the same as the “in” registers of the adjacent window. 
A SAVE instruction can then be used to decrement the Current Window Pointer, 
making the parameter values available to the subroutine without moving any 
data. 


Register windows improve performance in embedded applications because they 
function as local variable caches which retain either interrupt, subroutine, context 
or operating system variables with no additional overhead. Since procedure calls 
are efficient, optimizing compilers are not forced to replace them with inlined 
macros; this reduces the size of the compiled code, saving memory space, and 
making it possible to fit more complicated routines in the instruction cache. 


Register windows can be dedicated to individual contexts to enable very fast 
switching between contexts. When handling interrupts, the hardware immedi- 
ately moves to the adjacent window to start executing the service routine. In this 
way, an unused set of registers is made available in less than 3 processor cycles. 
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Each register in the register file has three read-only and one write-only port. The 
four-port structure allows even store instructions—which may require three oper- 
ands to be read out of the register file—to be completed in a single cycle. 


2.3.2 Special Uses of the r Registers 


Four of the r registers have special uses defined in the SPARC architecture: 


e¢ When global register 0 (r[0]) is addressed as a source operand, the constant 
value 0 is read. When r[0] is used as a destination operand, the data written is 
discarded, and no r register changes value. 


e The CALL instruction writes its own address into out register 7 (r[15)). 


e When a trap is taken, the current window pointer is decremented. The 
program counters PC and nPC are then automatically written into local 
registers 1 and 2 (r[17] and r[18]) of the new register window. 


2.3.3 SPARC-Defined Special-Purpose Registers 


The registers discussed in this section are defined as part of the SPARC architec- 
ture. | 


Processor State Register (PSR) 


The Processor State Register is the primary processor control and status register. 
It contains 11 mode and status fields which configure the processor and report 
processor status and exception results. The mode fields, shown in upper case in 
Figure 2-3, are set by the operating system to configure the processor. The status 
fields, shown in lower case, are set by the processor to indicate the effects of 
instruction execution. 


Except for several fields described below, the PSR can be written and read 
directly with the privileged instructions WRPSR and RDPSR. The PSR can also be 
modified by the SAVE, RESTORE, Ticc, and RETT instructions, and by any 
instruction that modifies the condition codes. 


31 28 27 24 23 20 19 12 11 8 7 6 5 4 0 
ee 
Figure 2-3. Processor State Register 

Bits 31-28: Implementation (impl)—Identifies the implementation number of the processor. In the 


MB86930 processor, it is hardwired to 0. The value in this field cannot be changed by a 
WRPSR instruction. 
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Bits 27-24: 


Bits 23-20: 


Bits 19-12: 


Bits 11-8: 


Bit 7: 


Bit 6: 


Bit 5: 


Bits 4-0: 


Version (ver)—Identifies the processor version, and is intended for factory use. It can be 
read, but not written. The Version field is hardwired to 2 in the MB86930 processor. 


Integer Condition Codes (icc)—Contains the negative (n), zero (z), overflow (v), and carry 
(c) integer condition-code flags. These bits are modified by the WRPSR instruction, and 
by arithmetic and logical instructions whose names end with the letters cc (for example, 
ANDcc). The Bicc (Branch on integer condition codes) and Ticc (Trap on integer condi- 
tion codes) instructions transfer program control based on the values of these bits. The 
integer condition code flags are defined as follows: 


n (Bit 23) Set to 1 if the ALU result was negative for the last instruction that modified 
the icc field; equal to 0 otherwise. 


z (Bit 22) Set to 1 if the ALU result was zero for the last instruction that modified the icc 
field; equal to 0 otherwise. 


v (Bit 21) If this bit equals 1, an arithmetic overflow occurred on the last instruction that 
modified tne icc fieid; it equais 0 otherwise. Logical instructions that modify 
the icc field always reset the overflow bit to 0. 

c (Bit 20) If this bit equals 1, either an arithmetic carry out of bit 31 occurred on the last 
addition that modified the icc, or a borrow out of bit 31 occurred as the result 
of the last subtraction that modified the icc. The carry bit equals 0 otherwise. 
Logical instructions that modify the icc field always reset the carry bit to 0. 


Reserved (reserved)—This field is reserved. When you use the WRPSR instruction, this 
field should always be written with Os. 


Processor Interrupt Level (PIL)—Specifies the levels of interrupt which the processor will 
accept. The processor accepts only interrupts with level 15 (non-maskable interrupts), or 
with levels higher than the value in the PIL field (maskable interrupts). Bit 11 is the most 
significant bit, and bit 8 is the least significant. 


Supervisor Mode (S)—Determines whether the processor is in supervisor mode (S=1) or 
user mode (S=0). Since instructions that write the PSR are available only in supervisor 
mode, the processor enters supervisor mode from user mode only when a reset, trap, or 
interrupt occurs. 


Prior S State (PS)—Records the value of the S bit when a trap is taken, so that the pro- 
cessor can return to the proper operating mode (user or supervisor) on return from the 
trap. Processor hardware changes the PS bit to the state of the S bit when entering a 
trap, and changes the S bit to the state of the PS bit when returning from the trap. 


Enable Traps (ET)—Enables traps (ET=1). When ET=0, traps are disabled and all inter- 
rupts are ignored. 


Current Window Pointer (CWP)—Points to the register window which is currently active. 
The CWP is written and read by the WRPSR and RDPSR instructions, is decremented by 
traps and the SAVE instruction, and is incremented by the RESTORE and RETT instruc- 
tions. The SPARClite processor implements 8 out of the 32 windows allowed in the 
SPARC definition, so only the 3 least significant bits of the CWP field are used. Arithmetic 
on the CWP is always performed modulo 8. Attempting to write a value to the CWP field 
which points to an unimplemented window results in an “illegal instruction” error. 
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Window Invalid Mask Register (WIM) 


The Window Invalid Mask Register contains 8 register-window mask bits, each of 
which corresponds to an implemented register window. If an operation which 
normally increments or decrements the Current Window Pointer would cause the 
CWP to point to a window whose corresponding WIM bit equals 1, a Window 
Overflow or Window Underflow trap occurs. 


The WIM can be written with the WRWIM instruction, and read with the RDWIM 
instruction. Both of these instructions are privileged. Bits corresponding to unim- 
plemented windows are read as Os; values written to these bits are ignored. 


31 8 7° 6 SS: 4 Se oT 
ee I oom 


Figure 2-4. Window Invalid Mask Register 


Bits 31-8: | Reserved Field (reserved)—This field is reserved for potential future expansion to addi- 
tional windows. 


Bits 7-0: Window Masks (W7-W0)—Window mask bits, with W7 the mask bit for window 7, and so 
on. 


Trap Base Register (TBR) 


The Trap Base Register contains three fields used by the processor to generate the 
address of the service routine when an interrupt or trap occurs. (The reset trap 
and breakpoint traps are the exception: They always bypass the TBR mechanism, 
transferring control to address 0 and 0x000003f0, respectively.) One of the three 
fields in the TBR can be written using the WRTBR instruction. The whole TBR can 
be read with the RDTBR instruction. Both of these instructions are privileged. 


31 12 11 4 3 0 


Figure 2-5. Trap Base Register 


Bits 31-12: Trap Base Address (TBA)—Contains the most significant 20 bits of the trap table base 
address. The TBA field is written with the WRTBR instruction. 


Bits 11-4: Trap Type (tt)—Contains an offset into the trap table corresponding to the last trap taken. 
Each trap is identified by a unique 8-bit trap type number. The processor writes the 
appropriate trap type into the tt field when it recognizes a trap, and then uses the number 
as an Offset into the trap table. The tt field remains unchanged until the next trap occurs. 
The WRTBR instruction does not affect the tt field. When the single vector trapping (SVT) 
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is enabled, the Trap Type bits are ignored. The trap vector is the address pointed to by 
TBA with all tt bits set to 0. The trap handler can read the tt field to find out the origin of 
the current trap. 


Bits 3-0: Null (null)—This field is hardwired to 0 to force 4-word increments of the trap vector. The 
WRTBR instruction does not affect this field. 


Y Register 


The “Y Register” is composed of a number of 32-bit latches, muxes, and bus driv- 
ers which reside in the data path of the Execute Block (see the Internal Architecture 
chapter). It is used during the multiply step instruction (MULScc) to contain the 
multiplier and the least significant bits of the partial products as they are evalu- 
ated. It is used during the divide step instruction (DIVScc) to contain the most sig- 
nificant 32 bits of a 64-bit dividend and the partial remainders as they are 
evaluated. It is also used by the multiply unit to hold the most significant words 
of the partial products and, when the multiplication is completed, the high 32 bits 
of the 64-bit product. 


The Y register can be read and written with the RDY and WRY instructions, 
respectively. WRY is not a “delayed write” instruction: the value written into the 
Y register is available to the following instruction. 


31 | 0 
Figure 2-6. Y Register 


e Multiply Step Support—At the beginning of a multiplication algorithm which 
uses the MULScc instruction, the 32-bit multiplier is loaded into the Y register 
with a WRY instruction. When the multiplication is completed, the least 
significant word of the 64-bit product will be in the Y register. 


e Divide Step Support—At the beginning of a division algorithm which uses the 
DIVScc instruction, the most significant word of the dividend is loaded into 
the Y register with a WRY instruction. At the end of the divide routine, the 
remainder will be in the Y register and can be read with a RDY instruction. 


e¢ Multiply Unit Support—The Y register is also used by the Multiply Unit (MU) 
during the UMUL, UMULcc, SMUL, and SMULcc instructions. The most 
significant word of the 64-bit product will be in the Y-Register when the 
multiplication completes. 
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Program Counter (PC) 


The Program Counter contains the word address of the instruction currently 
being executed by the Integer Unit. The PC cannot be directly read or written. 


31 0 


Instruction Address 


Figure 2-7. Program Counter 


Next Program Counter (nPC) 


The Next Program Counter contains the word address of the next instruction to 
be executed, assuming a trap does not occur. The nPC cannot be directly read or 
written. 


In delayed control transfers, the instruction that immediately follows the control 
transfer (the delay instruction) may be executed before control is transferred to 
the target. (See the Instructions section, below.) The nPC is necessary for imple- 
menting this feature. Most instructions complete by copying the contents of the 
nPC into the PC, then updating the nPC. The nPC is incremented by 4, unless the 
instruction implies a control transfer, in which case the computed target address 
is written into the nPC. The PC now points to the instruction which will be exe- 
cuted next, while the nPC points to the instruction which will be executed after 
that. 


31 0 


Instruction Address 


Figure 2-8. Next Program Counter 


Ancillary State Registers (ASR[31:1]) 


The SPARC definition includes 31 Ancillary State Registers, 15 of which 
(ASR[15:1]) are reserved for future use. The remaining ASKR’s can be defined and 


used in any way by SPARC implementations. The MB86930 defines the following 
ASR: 


ASR17—Used to enable and disable single-vector trapping. When this feature 
is enabled, all traps (except reset and breakpoint traps) vector to a single loca- 
tion, the base address of the trap table, as specified by the TBA field of the TBR 


Programmer’s Model - Registers 


2-11 














SPARClite User’s Guide 





register (tt=0). ASR17 can be read and written with the privileged instructions 
RDASR and WRASR. 





Reserved 
Reserved 
SVT, RST=0 


Figure 2-9. Ancillary State Register 17 


Bits 2-1: Reserved Field (reserved)—When writing to ASR17, both of these bits must be written 
with Os. 


Bit 0: Single Vector Trapping (SVT)—Enables single vector trapping when set to 1. The SVT bit 
equals 0 at reset. 


2.3.4 Memory-Mapped Control Registers 


In addition to the registers defined by the SPARC architecture, the MB86930 pro- 
vides a collection of memory-mapped registers which control peripheral func- 
tions. Figure 2-10 shows these registers and their locations in memory. The 
memory-mapped registers can be read and written with the alternate-space load 
and store instructions, which are privileged. 


0x00000000 ASI=0x1 | Cache/Bus interface Unit Control Register 
0x00000004 ASI=0x1 | Lock Control Register 

0x00000008 ASI=0x1 {Lock Control Save Register 

Ox0000000C ASI=0x1 | Cache Status Register 

0x00000010 ASI=0x1 | Restore Lock Control Register 
0x00000080 ASI=0x1 | System Support Control Register 


0x00000120 ASI=0x1 | Same-Page Mask Register 


0x00000124 ASI=0x1 | Address Range Specifier Registers (ARSR <5:1>) 


0x00000140 ASI=0x1 | Address Mask Register (AMR <5:0>) 


0x00000160 ASI=0x1 | Wait-State Specifier Registers (WSSR <2:0>) 


0x00000174 ASIl=0x1 | Timer Register 


0x00000178 ASI=0Oxi | Timer Preload Register 





Figure 2-10. Locations of Memory-Mapped Control Registers 
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Cache/Bus Interface Unit Control Register 


The Cache/BIU Control Register controls the operation of the data and instruc- 
tion caches, and the write and prefetch buffers of the Bus Interface Unit. This reg- 
ister is located at address 0x00000000 with an ASI of Ox1. 


31 5 4 3 2 1 0 












Write Buffer Enable (Enabled=1, Disabled=0, RST=0) 
Prefetch Buffer Enable (Enabled=1, Disabled=0, RST=0) 
Global Data Cache Lock (Lock On=1, Lock Off=0, RST=0) 

Data Cache Enable (Enabled=1, Disabled=0, RST=0) 

Global Instruction Cache Lock (Lock On=1, Lock Off=0, RST=0) 
Instruction Cache Enable (Enabled=1, Disabled=0, RST=0) 


Figure 2-11. Cache/Bus Interface Unit Control Register 


Bit 5: Write Buffer Enabled—When set to 1, enables the write buffer of the BIU only if both the 
instruction and data caches are enabled. At reset, this bit is 0. This bit should be changed 
only when the instruction and data caches are off. 


Bit 4: Prefetch Buffer Enabled—When set to 1, enables the prefetch buffer of the BIU only if 
both the instruction and data caches are enabled. At reset, this bit is 0. This bit should be 
changed only when the instruction and data caches are off. 


Bit 3: Global Data Cache Lock—Locks the current entries into the on-chip data cache; with this 
bit set to 1, no valid entry in the data cache will be replaced. To insure the best perfor- 
mance with the cache locked, invalid words in allocated cache locations will be updated. 
On write hits, with the data cache locked, the data is not written to external memory, 
allowing the locked cache to be used as scratchpad RAM or a run-time stack, indepen- 
dent of main memory. When the Data Cache Lock bit is 0, the cache operates normally. 
At reset, this bit is 0. 


Bit 2: Data Cache Enable—Turns the on-chip data cache on (1) and off (0). At reset, this bit 
is 0. 
Bit 1: Global Instruction Cache Lock—Locks the current entries into the on-chip instruction 


cache; with this bit set to 1, no valid entry in the instruction cache will be replaced. To 
insure the best performance with the cache locked, invalid words in allocated cache loca- 
tions will be updated. When this bit is 0, the cache operates normally. Writes to the 
Instruction Cache Lock bit do not affect cache operation for the following three instruc- 
tions. At reset, this bit is 0. 


Bit 0: Instruction Cache Enable—Turns the on-chip instruction cache on (1) and off (0). Writes 
to the Instruction Cache Enable bit do not affect cache operation for the following three 
instructions. At reset, this bit is 0. 
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Lock Control Register 


The Lock Control Register controls the locking of individual entries in the data 
and instruction caches. It is located at address 0x00000004 with an ASI of 0x1. 





Data Cache Entry Auto Lock (On=1, Off=0, RST=0) 
Instruction Cache Entry Auto Lock (On=1, Off=0, RST=0) 


Figure 2-12. Lock Control Register 


Bit 1: Data Cache Entry Auto Lock—Enables (1) and disables (0) auto-locking for entries in the 
on-cnip daia cache. Aii data accessed while this bit is 1 have the lock bits in their cache 
tags set to 1. Writes to this bit affect all subsequent data accesses. At reset, this bit is 0. 


Bit 0: Instruction Cache Entry Auto Lock—Enables (1) and disables (0) auto-locking for entries 
in the on-chip instruction cache. All instructions fetched while this bit is 1 have the lock 
bits in their cache tags set to 1. Writes to this bit do not affect cache operation for the fol- 
lowing three instructions. At reset, this bit is 0. 


Lock Control Save Register 


When an external interrupt or hardware trap occurs, the auto-locking of entries in 
on-chip cache is disabled. The Lock Control Save Register is used to re-enable 
auto-locking after the interrupt has been serviced. The value of the Lock Control 
Register before the interrupt or trap is automatically saved in the Lock Control 
Save Register, located at address 0x00000008 with an ASI of 0x1. To restore the 
correct auto-lock value on return from the service routine, software sets a bit in 
the Restore Lock Control Register. This will cause the value saved in the Lock 
Control Save Register to be moved to the Lock Control Register when a RETT is 
executed. (See Section 2.6.2) 





Previous Data Cache Entry Auto Lock (On=1, Off=0, RST=0) 
Previous Instruction Cache Entry Auto Lock (On=1, Off=0, RST=0) 


Figure 2-13. Lock Control Save Register 


Restore Lock Control Register 


On return from an external interrupt or hardware trap service routine, the Lock 
Control Register can have its previous value restored from the Lock Control Save 
Register. The Restore Lock Control Register, located at address 0x00000010 with 
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an ASI of 0x1, controls this feature. When bit 0 of this register is set to 1 and a 
RETT instruction is executed, the value in the Lock Control Save Register is 
placed into the Lock Control Register. 


There should be no traps between writing a 1 to bit 0 of the Restore Lock Control 
Register and the corresponding RETT instruction. This bit is cleared to 0 on reset, 
and also when a return from external interrupt or hardware trap is executed. 


31 0 


Restore Lock bit (Restore=1, Ignore=0, RST=0) me 


Figure 2-14. Restore Lock Control Register 


Cache Status Register 


If an attempt is made to lock a cache entry which is already locked, bit 0 in the 
Cache Status Register is set to 1. This bit can be cleared by software. The Cache 
Status Register is located at address 0x0000000C with an ASI of Ox1. 


31 0 


Cache Status, RST=0 mi 


Figure 2-15. Cache Status Register 


Same-Page Mask Register 


The Same-Page Mask Register controls the operation of the same-page detection 
logic by specifying which bits of the current ASI and address are to be compared 
with those of the previous ASI and address. If the specified (i.e., unmasked) bits 
all match, then the processor recognizes the two accesses as being “in the same 
page,” and asserts the -SAME_PAGE signal. These registers should not be writ- 
ten if the bus interface unit will handle addresses that are affected by the change 
in the next 3 processor cycles. The Same-Page Mask Register is located at address 
0x00000120 with an ASI of 0x1. 


31 30 23 22 





0 
ASI Mask <7:0> Address Mask (ADR <31:10>) 
(Care=0, Don’t Care=1, RST=Undefined) (Care=0, Don’t Care=1, RST=Undefined) 





Figure 2-16. Same-Page Mask Register 
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Bit 31: _ Reserved 


Bits 30-23: ASI Mask—Specifies which bits in the ASI of the current external access are to be com- 
pared with the corresponding bits in the ASI of the previous access. Only those bits are 
compared for which the mask bit is 0. Mis-matches in any other bits do not prevent the 
two accesses from being recognized as “on the same page.” The bits of this field are 
cleared to 0 on reset. 


Bits 22-1: Address Mask—Specifies which of the 22 most significant bits in the address of the cur- 
rent external access are to be compared with the corresponding bits in the address of the 
previous access. Only those bits are compared for which the mask bit is 0. Mis-matches 
in any other bits do not prevent the two accesses from being recognized as “on the same 
page.” The bits of this field are cleared to 0 on reset. 


Bit 0: Reserved 


Address Range Snecifier Registers (ARSRI5:11) 


Values in the Address Range Specifier Registers define up to five different 
address ranges, which are used for various system-support functions. The ARSRs 
are located in a contiguous block beginning at address 0x00000124 with ASI 0x1 
(see Table 2-3). 


The ARSRs, together with the Address Mask Registers, can be used to control the 
assertion of the Chip-Select outputs (-CS[5:1]). -CSn is asserted when the value 
on the address bus falls in the address range specified by ARSRn and AMRn. See 
the discussion of the Address Mask Registers, below. —CSO is asserted when the 
value on the address bus, as masked by AMRO, falls into the lowest range of 
Supervisor Instruction Space. The range of -CS0 (as masked by AMR0O) is 8K 
words. 


These registers should not be written if the bus interface unit will handle 
addresses that are affected by the change in the next 3 processor cycles. The user 
should be careful that two chip selects are never selected at the same time. A pro- 
grammable wait-state generator is also associated with each address range. See 
the discussion of the Wait-State Specifier Registers, below. 


31 30 23 22 1 0 


ASI <7:0> ADR <31:10> 
(RST=Undefined) (RST=Undefined) 


Figure 2-17. Address Range Specifier Registers 








Bit 31: Reserved 
Bits 30-23: ASI[7:0]—Specifies the ASI of a target address range. The value of this field is undefined 
on reset. 
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Bits 22-1: ADR[31:10]—Specifies the 22 most significant bits of a target address range. The value 
of this field is undefined on reset. 


Bit 0: Reserved 


Address Mask Registers (AMR[5:0]) 


AMRn works with ARSRn to define an address range. AMRn specifies which bits 
of the currently driven ASI and address are to be compared with the contents of 
ARSRn, and which bits are “don’t cares.” Except for AMRO, reset leaves the val- 
ues in the AMR registers undefined (see Table 2-3). These registers should not be 
written if the bus interface unit will handle addresses that are affected by the 
change in the next 3 processor cycles. The AMRs are located in a contiguous block 
beginning at address 0x00000140 with ASI 0x1. 


31 30 23 22 1 O 
ASI <7:0> ADR <31:10> 


(RST=Undefined)* (RST=Undefined)* 
* Except AMR[O]. See Table 2-3 





Figure 2-18. Address Mask Registers 


Bit 31: Reserved 


Bits 30-1: |. Mask—Specifies which bits in the ASI and address of the current external access are to 
be compared with the corresponding bits in the address-range specifier. Only those bits 
are compared for which the mask bit is 0. See Table 2-3 for reset value. 


Bit 0: Reserved 


Wait-State Specifier Registers (WSSR[2:0]) 


The wait-state specifiers determine, for each of the address ranges defined by the 
ARSR and AMR registers, the number of clock cycles between the time an address 
in a given range appears on the address bus and the time the processor generates 
an internal -READY signal. This makes it possible for memory and I/O devices 
with different access times to be connected to the processor without additional 
logic. 


The wait-state specifiers for the six address ranges are kept in three Wait-State 
Specifier Registers. These registers are located in a contiguous block beginning at 
address 0x00000160 with ASI 0x1 (see Table 2-3). Each register contains the wait- 
state specifiers for two address ranges. When the address currently being driven 
by the processor matches the unmasked bits in one of the Address Range Specifi- 
ers, the corresponding wait-state specifier is selected. These registers should not 
be written if the bus interface unit will handle addresses that are affected by the 
change in the next 3 processor cycles. 


Programmer's Model - Registers 


2-17 








SPARClite User’s Guide 





| 31 27 26 22 21 20 19 18 14 13 9 8 7 6 5 0 


Count 2 Count 1 Count 2 Count 1 
(RST=Undefined) (RST=Undefined) Nea (RST=Undefined)* (RST=Undefined)* Sf | | Reserved 


Wait Enable (On=1, Off=0, RST=*) 
Single Cycle (On=1, Off=0, RST=0) 
Override (On=1, Off-0, RST=") 












* See Table 2-3 


Figure 2-19. Wait-State Specifier Registers 


Bits 31-19: Wait-State Specifier—When an external access falls within an address range defined by 
an ARSR and AMR, the corresponding wait-state specifier determines when, and 
wheiner, ine processor generates an internai -READY signal to terminate the access. 


Count2 (Bits 31-27): The number of wait-states inserted before the internal -READY, under the fol- 
lowing conditions: the Single Cycle bit equals O and the current access is on the 
same page as the previous access. The number of wait-states i s the value of 
this field +1 (i.e., 0=1 wait-state, 1=2 wait-states, etc.) The value of Count2 is 
undefined on reset. 


Count1 (Bits 26-22): The number of wait-states inserted before the internal -READY, under the fol- 
lowing conditions: the Single Cycle bit equals 0 and the current access is not on 
the same page as the previous access. The number of wait-states i s the value 
of this field +1 (i.e., O0=1 wait-state, 1=2 wait-states, etc.) The value of Countt is 
undefined on reset. 


Wait Enable (Bit 21): Enables and disables the wait-state generator for an individual address range. 
If the Wait Enable bit of a wait-state specifier equals 0, the internal -READY is 
not asserted when addresses in the corresponding range are accessed by the 
processor. If Wait Enable is 1, the single cycle bit must be 0. See Table 2-3 for 
reset value. 


Single Cycle (Bit 20): Specifies the timing of the internal -READY signal. If the Single Cycle bit equals 
1 when an address in the appropriate range is accessed, the internal -READY 
is asserted in the same cycle. If the Single Cycle bit equals 0, and the current 
transaction is in the same page as the previous transaction, then Count2 is 
used as the number of cycles after which -READY is asserted internally. If the 
transaction is not in the same page, Count! is used instead. If Single Cycle is 
enabled, the Wait Enable bit must be 0. See Table 2-3 for reset value. 


Override (Bit 19): Allows the system to terminate a memory transaction before the internally spec- 
ified time. If the Override bit equals 1, and external hardware asserts the exter- 
nal -READY signal, then the wait-state generator will stop counting and will wait 
for the next transaction. This bit is cleared to 0 on reset. 


Bits 18-6: Wait-State Specifier—The wait-state specifier for a second address range. This field is 
organized just like bits 31-19. 


Bits 5-0: Reserved 
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System Support Control Register 


The System Support Control Register enables or disables the various system-sup- 
port features, independently of one another. However, the chip-select logic for 
address range 0 is always enabled, regardless of the value in the System Support 
Control Register. This register is located at address 0x00000080 with ASI 0x1 (see 
Table 2-3). 






31 6 5 43 2 1 =O 
Note: The chip select generation for Address Same-Page Enable (On=1, Off=0, RST=0) ee 
Range Specifier 0 is always enabled, Chip Select Enable (On=1, Off=0, RST=0 


) 

Programmable Wait-State (On=1, Off=0, RST=1) 
Timer On/Off (On=1, Off-0, RST=0) 

Reserved 


regardless of the value of the Chip Select 
Enable Bit. 


Figure 2-20. System Support Control Register 


Bits 31-6: Reserved 


Bit 5: Same-Page Enable—Enables (1) and disables (0) the same-page detection logic. When 
this bit is 1, the -SAME_PAGE signal is asserted whenever the address of an external 
access is on the same page as the previous access. The page size is controlled by the 
Same-Page Mask Register (see above). When this bit is 0, -SAME_PAGE is never 
asserted. The Same-Page Enable bit is cleared to 0 on reset. 


Bit 4: Chip Select Enable—Enables (1) and disables (0) the generation of chip-select signals 
for external accesses in address ranges 1 through 5. Regardless of the state of this bit, 
however, —CSO is always asserted when the current address lies in address range 0. The 
Chip Select Enable bit is cleared to 0 on reset. 


Note: Before enabling chip selects all chip select Address Mask and Address Range reg- 
isters should be initialized so that two chip selects are never selected at the same time. 


Bit 3: Programmable Wait-State—Enables (1) and disables (0) the programmable wait-state 
generators for address ranges 1 through 7 (see the discussion of the Wait-State Specifier 
Registers, above). Wait-state generation is always enabled for address range 0, regard- 
less of the state of this bit. The Programmable Wait-State bit is set to 1 on processor 
reset. 


Bit 2: Timer On/Off—Enables (1) and disables (0) the timer. This bit is cleared to 0 on reset. 
Bits 1-0: Reserved 
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Table 2-3: System Support Register Summary 


leat Affected by Address Range Specifier Address Mask Wait-State Specifier 





Chip-Select 
EA cciet Address Value at Reset Address Value at Reset Address Value at Reset 
(ASI=0x01) (ASI=0x01) (ASI=0x01) 


N/A ASI=0x09 0x0000 0140 All mask bits 0 0x0000 0160 Count 1,2 = 31 
ADR<31:10>=0 except (low halfword) Wait Enable=1 
ADR<14:10> = 1 Single Cycle =0 

Override=1 





{ 0x0000 0124 0x0000 0144 0x0000 0160 
(high halfword) 

2 0x0000 1280 0x0000 0148 0x0000 0164 
(low halfword) 

3 Yes 0x0000 012C Undefined 0x0000 014C Undefined 0x0000 0164 | Count 1.2 = Undefined 
(high halfword) Wait Enable =0 


Sinale Cucla —n 
singe CYC =U 


Override=0 





Oxd606 Oi Ox0000 0168 
(low halfword) 


0x0000 0134 0x0000 0154 0x0000 0168 
(high halfword) 


Timer Register 


The Timer Register contains the current count of the internal 16-bit timer. When 
the timer overflows, the processor asserts the -TIMER_OVF signal and reloads 
the Timer Register with the contents of the Timer Preload Register. The Timer 
Register can also be loaded directly by writing to the address 0x00000174 with 
ASI 0x1. The timer is clocked at the processor clock frequency. 


31 16 15 0 
Timer Value 


Figure 2-21. Timer Register 


Timer Preload Register 


The Timer Preload Register contains the value which is loaded into the timer 
when the timer overflows. In effect, this register specifies the number of clock 
cycles between assertions of the -TIMER_OVF signal. The Timer Preload Register 
is located at address 0x00000178 with ASI Ox1. 


31 16 15 0 
Timer Pre-Load Value 


Figure 2-22. Timer Pre-Load Register 
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2.4 Data Types 


Direct support is provided for signed and unsigned integers of various lengths, as 
illustrated in Figure 2-23. A tagged word type is supported for tagged arithmetic, 
used in artificial intelligence applications. Other data types. (character strings, 
floating-point types, and so on) must be handled in software. 





Signed Integer Byte 7 6 0 
Signed Integer Halfword 15 14 0 
i) 
Signed Integer Word 31 30 0 
a en 
Signed Integer Double 31 30 0 
SD-0 signed_integer [62:32] 
0 


31 
SD-1 signed_integer [31:0] 


Unsigned Integer Byte 7 0 


Unsigned Integer Halfword 15 0 


Unsigned Integer Word 31 


Tagged Word 31 210 
tag 


Unsigned Integer Double 31 
UD-0 unsigned_integer [62:32] 


1 0 
UD-1 unsigned_integer [31:0] 


[é) 


Figure 2-23. Data Types 


2.5 Instructions 


SPARClite provides an upward-compatible superset of the SPARC integer 
instruction set. Each instruction is a single 32-bit word. There are only three basic 
instruction formats, and few addressing modes. 


The additional MB86930 instructions—integer divide-step, and scan for first 
changed bit—are implemented to achieve higher performance in embedded 
applications. Table 2-4 lists the MB86930 instruction set by function, and shows 
how to interpret the instruction mnemonics. 
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Table 2-4: Instruction Mnemonics 


Load and Store: 


Byte 
LoaD Signed Halfword — normal 
STore Unsigned word Alternate 
Double word 


atomic SWAP word 
atomic Load-Store Unsigned Byte 


Branch < Integer CC normal . 
Annul delay instr. 


CALL 

Trap on Integer CC 
JuMP and Link 
RETurn from Trap 


AND 

OR normal normal 

XOR Not set 

Arithmetic and Shift: 

UMUL normal 

SMUL set CC 

ADD normal normal 

SUB eXtended set CC 
Shift Left Logical 

Right Arithmetic 
Tagged « ADD } set cc < normal 
ae { SUB } { Trap oVerflow 

SCAN 
DiVide Step set CC 


MULtiply Step set CC 
SETHI 


Control Transfer: 


Logical: 


Read/Write Control Registers: 


¥: 
ReaD Fon 
WRi WIM 
Rite TBR 
ASR 
SAVE 
RESTORE 


In the MB86930 processor, the floating-point and coprocessor instructions defined 
in the SPARC architecture are trapped for software emulation. 


Programmer's Model - Instructions 


2-22 








2.5.1 Instruction Formats 


Figure 2-24 shows the three basic instruction formats. 


Format 1 (op=1): CALL 


31 0 
fp] OCS CSC—~“S*S*SC—C—CSsSSSCSC*dY 
Format 2 (op=0): SETHI & Branches (Bicc, FBfcc, CBccc) 

31 30 29 28 25 24 = 22 21 0 
Po] «a | om [  mmars—~—SCSCSCSCSY 
-op [a cond 
Format 3 (op=2 or 3): Remaining instructions 

31 30 29 25 24 19 18 14 13 12 5 4 0 
fo [a | oo | «i a 

Po [eT tii 
Po [a | oss St oni 2 


Figure 2-24. Instruction Formats 


op, op2, op3 One or more of these fields appear in every format to encode the 
instruction. The 2-bit op field is used in all three formats, and is 
interpreted as follows: 


op Encoding (All Formats) 


Bicc, FBfcc, CBccc, SETHI 


CALL 
arithmetic, logical, shift and remaining 
memory instructions 





The 3-bit op2 field is used, along with the op field, to encode the 
format 2 instructions, and is interpreted as follows: 


op2 Encoding (Format 2) 


unimplemented 
unimplemented 
Bicc 
unimplemented 
SETHI 
unimplemented 
FBfcc 

CBccc 


0 
{ 
2 
3 
4 
Ss) 
6 
7 
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rd, rs1, rs2 


disp30, disp22 


cond 


imm22 


simm13 


asi 


opf 





The 6-bit op3 field is used, along with the op field, to encode the 
format 3 instructions. An Instruction Index by Operation Code is 
given in Chapter 7 of this manual. 


These 5-bit fields contain register addresses, interpreted as dis- 
cussed in the General-Purpose Registers section, above. The rd field 
specifies the source operand for a store, or the destination oper- 
and for some other operation. The rs1 and rs2 fields specify 
source operands. 


These 30-bit and 22-bit fields contain word-aligned, sign- 
extended, PC-relative displacements for a call or branch, respec- 
tively. 


This bit is used in branch instructions to specify whether or not 
the instruction following the branch can be annulled. 


This 4-bit field selects the condition codes to test for a conditional 
branch instruction. 


Contains a 22-bit constant which the SETHI instruction places in 
the upper end of a specified destination register. 


Selects the second ALU operand for arithmetic and load/store 
instructions. If i equals 1, the operand is r[rs2]. If i equals 0, the 
operand is simm13, sign-extended from 13 to 32 bits. 


Contains a sign-extended 13-bit immediate value used as the sec- 
ond ALU operand for an arithmetic or load/store instruction 
when 7 equals 1. 


Contains the 8-bit Address Space Identifier required for the load 
alternate and store alternate instructions. 


Encodes a floating-point operate or coprocessor operate instruc- 
tion. All such instructions are trapped for software emulation. 


2.5.2 Logical Instructions 


The logical instructions perform bit-wise boolean operations. As shown in 

Table 2-5, each logical instruction comes in two versions: one leaves the integer 
condition codes in the Processor State Register unchanged; the other changes the 
condition codes as a side-effect. 
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Logical Instructions 


AND And 

ANDcc And and modify icc 

ANDN And Not 

ANDNcc And not and modify icc 
Inclusive Or 


Inclusive Or and modify icc 
Inclusive Or Not 
Inclusive Or Not and modify icc 
Exclusive Or 
Exclusive Or and modify icc 
Exclusive Nor 

XNORcc Exclusive Nor and modify icc 





The logical instructions are all format 3 instructions. When the i field is 0, they 
take their arguments from two source registers (r[rs1] and r[rs2]); when the i field 
is 1, they take one argument from source register r[rs1] and the other from the 
simm13 field (sign-extended to 32 bits). In both cases, the result is written to the 
destination register r[rd]. 


2.5.3 Arithmetic and Shift Instructions 


The integer arithmetic instructions are generally three-register instructions which 
compute a result that is a function of the two source operands, and either write 
the result into the destination register r[rd], or discard it. One of the source oper- 
ands is always taken from register r[rs1]; the other source depends on the 7 bit in 
the instruction. If i equals 0, the second operand is taken from register r[rs2]; if 1 
equals 1, the second operand is the value in the simm13 field of the instruction, 
sign-extended to 32 bits. By specifying global register 0 as the destination, the 
instruction effectively discards the result. (See Section 2.3.2, Special Uses of the r 
Registers). 


Besides the standard arithmetic operations, SPARC provides instructions to per- 
form tagged arithmetic. In tagged arithmetic, the two least-significant bits of each 
operand are used to indicate the (user-defined) data type of the operand. The 
tagged arithmetic instructions set a condition code if the tag of an operand is not 
Zero. 


The shift instructions shift the contents of an r register by a constant or variable 
number of bits. They do not affect the condition codes. 
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Besides the instructions defined in the (Version 8) SPARC architecture, SPARClite 
provides: 


e A divide-step instruction, which can be used to construct efficient iterative 
integer division algorithms. 


e A scan instruction, which determines the first bit in a word which differs from 
the most-significant bit. The scan instruction can be used to simplify and 
accelerate many important operations, like normalizing numbers with 
redundant sign bits. 


Add and Subtract 


The integer addition and subtraction instructions, listed in Table 2-6, perform 
two’s-complement arithmetic. Each instruction comes in four versions: these 
oithoar affact intaaar canditian cadacd in tha Deacacenr welcies 8 Bevllt onitas cosecaas Be alse Es 2 
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unchanged and either include the carry bit in the result or ignore it. 


Table 2-6: Addition and Subtraction Instructions 


ADD Add 
ADDcc Add and modify icc 
ADDX Add with Carry 


ADDXcc Add with Carry and modify icc 


SUB Subtract 

SUBcc Subtract and modify icc 

SUBX Subtract with Carry 

SUBXcc Subtract with Carry and modify icc 





The integer addition and subtraction instructions are format 3 instructions. When 
the i field is 0, they take their arguments from two source registers (r[rs1] and 
r[rs2]); when the i field is 1, they take one argument from a source register and the 
other from the simm13 field (sign-extended to 32 bits). The result is written to the 
destination register r[rd]. 


In subtraction, the second argument, whether register (r[rs2]) or immediate 
(simm13), is always subtracted from the first (r[rs1]). 


The extended addition instructions ADDX and ADDXcc also add the carry bit (c) 
of the Processor Status Register; that is, they compute either “rl7s1] + r[rs2] +c” or 
“r[rs1] + sign-extended(simm13) +c,” and store the result in r[rd]. 


The extended subtraction instructions SUBX and SUBXcc also subtract the carry 
bit (c); that is, they compute either “r[rs1] - rlrs2] - c” or “r[rs1] - sign-extended(- 
simm13) -c,” and store the result in r[rd]. 
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Overflow occurs on addition if both operands have the same sign and the sign of 
the sum is different. Overflow occurs on subtraction if the operands have differ- 
ent signs and the sign of the difference differs from the sign of r[rs1]. 


A special comparison instruction for integer values is not needed, since it can be 
easily synthesized from the SUBcc instructions (See Chapter 7). 


Tagged Add and Subtract 


The tagged arithmetic instructions, listed in Table 2-7, perform two’s-complement 
addition or subtraction on their operands. 


Tagged Arithmetic Instructions 


TADDcc Tagged Add and modify icc 

TADDccTV | Tagged Add, modify icc and Trap on Overflow 
TSUBcc Tagged Subtract and modify icc 

TSUBccTV | Tagged Subtract, modify icc and Trap on Overflow 


If either of operand has a non-zero tag, or if arithmetic overflow occurs, the 
overflow bit of the Processor Status Register is set to 1. The trapping versions 
(TADDccTV and TSUBccTV) also cause a tag_overflow trap whenever they set 
the overflow bit. Except for these special side effects, the tagged arithmetic 
instructions work just like the ordinary addition and subtraction instructions, 
which are described above. 


TADDcc and TSUBcc modify the integer condition codes; TADDccTV and 
TSUBccTV also modify the condition codes when they do not trap. 











Multiply and Multiply-Step 


The integer multiplication instructions, listed in Table 2-8, are directly supported 
in hardware. 


Integer Multiply Instructions 


UMUL Unsigned Integer Multiply 
SMUL Signed Integer Multiply 


UMULcc Unsigned Integer Multiply and modify icc 
SMULcc Signed Integer Multiply and modify icc 
MULScc Multiply Step and modify icc 





The multiply instructions perform a signed or unsigned multiplication of a 32-bit 
multiplicand (r[rs1]) and a 32-bit multiplier (either r[rs2] or simm13, sign- 
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Table 2-9: 


extended to 32 bits), resulting in a 64-bit product. The low order 32 bits of the 
product are placed in the destination register (r[rd]), and the upper 32 bits of the 
product are placed in the Y register. 


In general, the multiplication requires 5 cycles, but there are three special cases of 
early termination. If either the multiplier or the multiplicand is zero, the execution 
takes 1 cycle. If the multiplier is an 8-bit integer or less, the execution takes 2 
cycles. If the multiplier is a 9-bit to 16-bit integer, the execution takes 3 cycles. 


UMUL and SMUL do not affect the integer condition codes. The effect of 
UMULcc and SMUL«<cc on the condition codes is shown in Table 2-7. 


Effect of Integer Multiplication on Condition Codes 


} ice bit UMULcc SMULcc 


Sei if produci [31] = i Sei if product [31] = 1 
=0 Set if product [31:0] = 0 


Set if product [31:0] 
Zero Zero 
Zero Zero 





The multiply-step instruction, MULScc, treats r[rs1] and the Y register as a single, 
64-bit, right-shiftable doubleword register. The least significant bit of r[rs1] is 
treated as if it were the adjacent to the most significant bit of the Y register. 


Multiplication with MULScc assumes that the Y register initially contains the 
multiplicand, r[rs1] contains the most significant bits of the product, and r[rs2] (or 
simm13) contains the multiplier. Upon completion of the multiplication, the Y reg- 
ister contains the least significant word of the product. The operation of MULScc 
is described in the Programming Considerations chapter. 


Divide-Step 


The divide-step instruction, DIVScc, performs one bit-cycle of a non-restoring, 
shift-before-add, signed or unsigned integer division algorithm. It operates on a 
signed or unsigned dividend, with an unsigned divisor. It uses the integer condi- 
tion code bits to carry the true sign of the remainder, and the previous quotient 
bit, from one cycle to the next. Remainder and quotient are kept in correct relative 
alignment because of the shift-before-add technique. Standard SPARC instruc- 
tions are therefore sufficient for initializing and terminating both signed and 
unsigned division routines, eliminating the need for special divide-initialize, 
divide-terminate or remainder correction instructions. 


Division with DIVScc assumes that the Y register initially contains the most sig- 
nificant word of the dividend, r[vs1] contains the least significant word of the div- 


~ idend, and 1[rs2] (or simm13) contains the divisor. Upon completion of the 


division, the Y register contains the remainder and r[rd] contains the quotient. 
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When DIVScc is used as expected, it will typically use the same register for rd and 
rs1. One exception is a signed division with one word dividend, in which the ini- 
tial value of r[rs1] is saved in the first divide step by using an rd different from 
rsl. 





DIVScc operates as follows: 


1. The true sign is formed using the negative (n) and overflow (v) integer condi- 
tion codes from the Processor Status Register. True sign =n XOR v. 


2. The remainder is formed by upshifting the Y register (initially the most signifi- 
cant word of the dividend) one bit, and setting the least significant bit of 
remainder equal to most significant bit of r[rs1] (initially the least significant 
word of the dividend). 


3. The divisor is r[rs2] if the i field is 0, or simm13, sign-extended to 32 bits, if the 1 
field is 1. 


4. If true sign = 0 (+), the ALU computes remainder - divisor. If true sign =1 (-), the 
ALU computes remainder + divisor. 


5. Carry out from the ALU operation is noted as c0. The negative (n) condition 
code is set to bit 31 of the ALU result. The zero (z) condition code is set if the 
ALU result is 0 AND the true sign equals Y[31], else cleared. 


6. The new true sign is formed as (true sign AND NOT Y[31]) OR (NOT cO AND 
(true sign OR NOT Y([31))). 


7. The overflow (v) condition code is formed as new true sign XOR bit 31 of the 
ALU result. The carry (c) condition code is set to NOT new true sign. Y is set to 
the 32-bit ALU result. If rd is not 0, then r[rd] is set to r[rs1], upshifted one bit 
with NOT new true sign (the new quotient bit) in the least significant bit posi- 
tion. 


See the Programming Considerations chapter for sample signed and unsigned divi- 
sion routines based on the DIVScc instruction. 


Shift 


The shift instructions, listed in Table 2-10, perform logical or arithmetic shifts on 
values in r registers. The shift count for these instructions is either a constant (the 
least significant 5 bits of simm13) or variable (the least significant 5 bits of r[rs2]), 
depending on the value in the i field: The least significant 5 bits of the 2’s comple- 
ment of a shift count are the same as 32 minus the shift count. No shift occurs 
when the shift count is 0. 


Table 2-10: Shift Instructions 


operation 
Shift Left Logical 


Shift Right Logical 
Shift Right Arithmetic 
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SLL and SRL fill vacated bit positions with 0’s. SRA fills vacated bit positions with 
the most significant bit of the r[rs1] operand; that is, SRA treats its result as a 
two’s-complement number, and sign-extends it to 32 bits. The shift instructions 
do not affect the condition codes. 


An arithmetic shift left can be effected using the ADDcc instruction. 


Scan 


The SCAN instruction scans a register from MSB to LSB looking for either the first 
changed bit, first 1 or first 0 depending on the value of the source 2 operand. 
SCAN is a superset to the standard SPARC instruction set. It is decoded in an 
unused opcode and does not affect compliance with the SPARC architecture stan- 
dard. 


The SCAN instruction is useful for supporting operations like floating-point nor- 
malization by finding the number of sign bits in a single processor cycle. Data 
compression schemes like run length encoding execute significantly faster using 
SCAN as well. 


SCAN works by computing the bitwise XOR of r[vs1] with a mask created by 
right-shifting r[rs2] by one bit and sign-extending the result. It finds the first 1 in 
the result, and writes this bit number to the destination register (1[rd]). Bit num- 
bers range from 0 for the most significant bit to 31 for the least significant. If the 
two operands are identical, the value 63 is written into r[rd]. 


Starting with the same number in r[rs1] and r[rs2], SCAN returns the number of 
sign bits. Consider the first example shown in Figure 2-25. Both source registers 
contain 0b00011.... The right-shifted, sign-extended, rs2 value is 0b000011..., and 
the result of the bitwise XOR is 0b0001.... The bit-position of the first 1 in this 
result (counting from zero, from the left) is 3, which is also the number of sign bits 
in the rs1 value. Similarly, example 2 shows the case where the sign bits are ones. 


By using global register 0, which always reads as 0, as the mask operand (rs2), the 
bit position of the first 1 in rs1 can be found, as in the third example shown in 
Figure 2-25. Similarly, by using the immediate value -1, which extends to all 1’s, 
as the mask operand, the bit position of the first 0 in rs1 is found. (See example 4). 
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SCAN does not affect the condition codes. 


Example 1: finding the first changed bit (the first 1) Example 3: finding the first 1 
r[rs1] =0Ob00011... (source 1) rfrs1]_ =0b00011... (Source 1) 





r[rs2] =Ob00011... (Source 2) r[rs2}_ =Ob00000... _—_ (source 2, immediate value 0 or %g0) 

mask =0b000011 (source 2 shifted) mask = 0b000000 (source 2 shifted) 

xor = 0b00010... = (xor of source 1 and mask) xor = 0b00010..._—_ (xor of Source 1 and mask) 

r[d] =3 (bit location of first changed bit) r[d] = 3 (bit location of first changed bit) 

Example 2: finding the first changed bit (the first 0) Example 4: finding the first 0 

r[rsi] =0b11100... (source 1) r[rs1] =0b10000... (source 1) 

r[rs2] =0b11100... (Source 2) r[rs2] =0b11111... (source 2, immediate value -1) 

mask =0b111100 (source 2 shifted) mask =0b111111 (source 2 shifted) 

xor =Qb00010..._—_ (xor of source 1 and mask) xor =0b01111.... (xor of source 1 and mask) 

r[d] a3 (bit location of first changed bit) r[d] = (bit location of first changed bit) 
Figure 2-25. Using the SCAN Instruction 

Constants 


The SETHI instruction loads a 22-bit immediate constant into an r register. SETHI 
zeroes the 10 least-significant bits of r[rd], and replaces its 22 high-order bits with 
the value from the imm22 field of the instruction. SETHI does not affect the inte- 
ger condition codes. A SETHI instruction with rd = 0 and imm22 = 0 is the SPARC 
(Version 8) definition of a NOP. 


2.5.4 Control Transfer Instructions 


A control transfer instruction (CTD) is one which changes the value in the Next 
Program Counter (nPC) register. There are five basic types of control transfer 
instructions: conditional branches (Bicc), calls (CALL), jumps (JMPL), returns 
from trap (RETT), and conditional traps (Ticc). 


As shown in Table 2-11, the control transfer instructions can be classified accord- 
ing to two criteria: how the target address is calculated, and when the control transfer 
takes place, relative to the CTI. 


Table 2-11: Classification of Control Transfer Instructions 


Control-Transfer Target Address Transfer Time 
Instruction Calculation Relative to CTI 


Bicc PC-relative conditional-delayed 


CALL PC-relative delayed 
JMPL, RETT register-indirect delayed 
Ticc register-indirect-vectored non-delayed 
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Three different schemes are used for computing target addresses: 


PC-Relative—Adds an address displacement to the current PC value. The disp30 
(CALL) or disp22 (Bicc) field of the instruction specifies the number of words 
to be added to the PC; this number can be positive or negative. The disp value 
is sign-extended, then left-shifted by two bits to create the (byte) address 
displacement. 


Register-Indirect—Adds its two source operands (r[rs1] is always one of the 
operands; the other is r[rs2] when i = 0, and simm13, sign-extended to 32 bits, 
when i = 1). 


Register-Indirect-Vectored—Calculates the target address in two stages: it first 
obtains a trap type by adding 128 to the least significant 7 bits of the sum of its 
two source operands. r[rs1] is always one of the operands; the other is r[rs2] 
when i = 0, and simm13, sign-extended to 32 bits, when i = 1. The trap type 
number is then stored in the tt field of the Trap Base Register. The resulting 
value in the TBR is the target address. 


Control transfer can either occur immediately after the CTI, or be delayed. The 
control transfer instructions fall into three classes: 


Delayed—Transfers control to the target address after a one-instruction delay. 
The delay instruction—the one whose address is in the nPC register when a 
delayed CTI is executed—is executed before the transfer of control to the 
target address. Special care is required when the delay instruction is itself a 
CTI; see the section on Delayed-Control Transfer Couples, below. 


Non-Delayed—Transfers control to the target address immediately after the 
CTI is executed. 


Conditional-Delayed—Causes either a delayed or a non-delayed transfer of 
control, depending on two things: the value of the a (annul) bit in the 
instruction, and on whether or not the transfer itself is conditional. Details are 
provided below, under the heading Branches. 


Branches 


The Bicc instructions, listed in Table 2-12, perform program branches, either 
unconditionally or conditioned on the current values of the integer condition 
codes (bits 23-20 of the Processor Status Register). The branch target is specified 
by a PC-relative displacement. 
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Table 2-12: Branch Instructions 


[evens [eons | eneton ets 


Branch Always 

Branch Never 

Branch on Not Equal 
Branch on Equal 

Branch on Greater 

Branch on Less or Equal 
Branch on Greater or Equal 
Branch on Less 





































; 

not Z 

Z 

not (Z or (N xor V)) 
Z or (N xor V) 

Not N xor V) 

N xor V 



















Branch on Greater Unsigned not (Cor Z) 
Branch on Less or Equal Unsigned (C or Z) 
Branch on Carry Clear (Greater than or Equal, Unsigned) not C 
Branch on Carry Set (Less than, Unsigned) C 
Branch on Positive not N 
Branch on Negative N 
Branch on Overflow Clear not V 





Branch on Overflow Set V 


The unconditional branch BA causes a PC-relative delayed control transfer, 
regardless of the integer condition code values. If the a (annul) field is 0, the delay 
instruction is executed; if the a field is 1, the delay instruction is annulled (not exe- 
cuted). 


The unconditional branch BN does not cause a transfer of control. BN acts like a 
NOP when its a (annul) field is 0. When its a (annul) field is 1, the following 
instruction (i.e., the delay instruction) is annulled. 


The Bicc instructions other than BA and BN perform conditional branches, based 
on the current values of the integer condition codes. The test condition is coded 
into the cond field of the instruction, as shown in Table 2-12. If the test condition 
evaluates as true, the branch is taken, otherwise, no transfer of control takes place. 


If a conditional branch is taken, the delay instruction is always executed, no mat- 
ter what the value of the a (annul) field. If a conditional branch is not taken, and 
the a (annul) field is 1, then the delay instruction is annulled. 
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Table 2-13: 
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Table 2-13 summarizes the conditions under which the delay instruction is exe- 
cuted, for the various types of branches. 


Conditions for Executing Delay Instructions 


Delay instruction 


a= 1 


unconditional 


conditional, taken 
conditional, non taken 


unconditional 


conditional, taken 
conditional, non taken 


YES 
YES 
YES 


NO (annulled) 
YES 
NO (annulled) 





The effect of a branch instruction on the processor pipeline is shown in 
Figure 2-26. 


CLK 


Fetch 
Decode 
Execute 


Memory 


Write-Back | 


Call and Link 


| 


Delay instruction may be annulled in 
which case it is treated as a NOP 


br 


delay 
br 


target 
delay 
br 


Inst 1 


target Inst 1 
delay target 
br delay 


br 


Inst 1 
target 


delay 


Inst 1 


target inst 1 


Figure 2-26. Pipeline Sequence: Branch 


The CALL instruction writes the contents of the PC (i.e., the address of the CALL 
itself) into out register 7 (r[15]) of the current window. It then causes a delayed 
control transfer to a PC-relative target address. The instruction field that specifies 
the address displacement is 30 bits wide, so CALL can be used to transfer control 
anywhere in the address space. The call instruction pipeline sequence is identical 
to Figure 2-26, except that the delay instructions cannot be annulled. 


Jump and Link 


The JMPL instruction writes the contents of the PC (1.e., the address of the JMPL 

itself) into the destination register r[rd]. It then causes a delayed control transfer to 
a register-indirect target address. If the target address is not word-aligned, a 
mem_address_not_aligned trap occurs. 
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Forced “no operation” 


CLK 
Fetch jmpl delay nop target inst1 
Decode jmpl delay nop target inst1 
Execute jmp} delay nop target inst1 
Memory jmpl delay nop target inst1 
Write-Back jmpl delay nop target inst4 


Figure 2-27. Pipeline Sequence: Jump and Link 


Return from Trap 


Unless it causes a trap, the RETT instruction does four things: it increments the 
Current Word Pointer (modulo 8), causes a delayed control transfer to the regis- 
ter-indirect target address, restores the processor to the operating mode (user or 
supervisor) it was in before the trap was taken, and enables traps. 


If traps are enabled (i.e., if the ET bit of the Processor Status Register is set to 1), 
RETT will always cause a trap. A privileged_instruction trap will occur if the pro- 
cessor is in user mode, and an illegal_instruction trap will occur if the processor is 
in supervisor mode. 


If traps are disabled (ET = 0), RETT can cause the following traps, in decreasing 
order of priority: 
e Privileged_instruction, if the processor is in user mode. 


e Window_underflow, if the new CWP corresponds to a set bit in the Window 
Invalid Mask register. 


e Mem_address_not_aligned, if the target address of the control transfer is not 
word-aligned. 


In these cases, the processor will write the appropriate trap type number into the 
tt field of the PSR, enter the error state, and halt. 


Forced “no operation” 


CLK 
Fetch jmp! rett nop target inst1 
Decode jmpl rett nop target inst1 
Execute jmp! rett nop target inst1 
Memory jmpl rett nop target inst1 
Write-Back jmpl rett nop target inst1 


Figure 2-28. Pipeline Sequence: RETT 
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Software Traps 


The Ticc instructions, listed in Table 2-14, generate the trap_instruction trap, 
either unconditionally or conditioned on the current values of the integer condi- 
tion codes (bits 23-20 of the Processor Status Register). Ticc can be used to imple- 
ment breakpoints, traces, and system calls. It can also be used for run-time checks, 
such as out-of-range array indexes or integer overflow. 


Table 2-14: Trap Instructions 











[evende [cond [epernton ist 

















1000 Trap Always 

0000 Trap Never 
1001 Trap on Not Equal not Z 
0001 Trap on Equal Z 
10718 Trap on Greater noi (Z or (iN xor V)) 
0010 Trap on Less or Equal Z or (N xor V) 
1011 Trap on Greater or Equal Not N xor V) 
0011 Trap on Less N xor V 
1100 Trap on Greater Unsigned not (Cor Z) 
0100 Trap on Less or Equal Unsigned (C or Z) 
1101 Trap on Carry Clear (Greater than or Equal, Unsigned) not C 
0101 Trap on Carry Set (Less than, Unsigned) C 
11710 Trap on Positive not N 
0110 | Trap on Negative N 
1111 Trap on Overflow Clear not V 
0111 Trap on Overflow Set V 


The Ticc instructions evaluate a boolean test condition based on the current val- 
ues of the integer condition codes. The test condition is coded into the cond field of 
the instruction, as shown in Table 2-14. If the test condition evaluates as true, and 
no higher-priority trap or interrupt request is pending, the trap_instruction trap is 
generated. Otherwise, the instruction behaves like a NOP. The test condition for 
TA always evaluates as true, the condition for TN evaluates as false. 


When Ticc generates a trap, the trap type is written into the tt field of the Trap 
Base Register. The trap type is calculated by adding 128 to the seven least signifi- 
cant bits of the sum of the two instruction operands. Register rlrs1] is always one 
of the operands; the other is r[rs2] when i = 0, and simm13, sign-extended to 32 
bits, when i = 1. The 25 most significant bits of r[rs2], or the 6 most significant bits 
of simm13, are unused and should be supplied as 0 by software. 


Control is then transferred to the address in the TBR. The processor enters super- 
visor mode, disables traps, decrements the CWP (modulo 8), and saves the PC 
and nPC into r[17] and r[18] (local registers 1 and 2) of the new window. See the 
section on Interrupts and Traps, below. 
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Delayed Control-Transfer Couples 


When a delayed control-transfer instruction is followed by another control-trans- 
fer instruction, the pair of CTT’s is called a delayed control-transfer couple (DCTI 
couple). The order of execution for DCTI couples is illustrated by the examples in 
Table 2-15. 


Order of Execution for Delayed Control-Transfer Couples 


12: CTI 40 16: CTI 60 Order of Execution by Address 


DCT! unconditional DCTI taken 12, 16, 40, 60, 64... 
DCTI unconditional B*cc (a=0) untaken 12, 16, 40, 44.... 
DCTI unconditional B*cc (a=1) untaken 12, 16, 44, 48,... (40 annulled) 
DCTI unconditional B*A (a=1) 12, 16, 60, 64,... (40 annulled) 
BA (a=1) any CTl 12, 40, 44,... (16 annulled) 
B*cc DCTI 12, 16, 40, 60, 64, 68... 




































Note: Where the a bit is not indicated above, it may be either 0 or 1. See next table for abbreviations. 


Abbreviations used in Previous Table 


Abbreviation Refers to Instructions 


B*cc Bicc (including BN, but excluding BA) 
DCTI unconditional CALL, JMPL, RETT, or BA (with a=0) 
DCTI taken CALL, JMPL, RETT, BA (with a=0), or B*cc taken 










In the first five cases in Table 2-15, the first instruction causes an unconditional 
control transfer. Common examples of such DCTI couples are the JMPL, RETT 
sequences that can be used to return from a trap handler. In Case 6, the first 
instruction is a conditional branch; the order of execution is implementation- 
dependent. 


2.5.5 Load and Store Instructions 


The load and store instructions are the only ones that access memory and I/O, 
allowing bytes, half-words, words and doublewords to be transferred to and from 
processor registers. 


Addressing modes are few and simple: the effective memory address is r[rs1] + 
r[rs2] when i = 0, and r[rs1] + (simm13, sign-extended to 32 bits) when i= 1. The 
destination field, rd, specifies the register that supplies the data for a store, or 
receives it for a load. 


The SPARC addressing convention is big-endian: the address of a halfword, 
word, or doubleword is the address of its most significant byte; increasing the 
address generally decreases the significance of the unit being addressed. 
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Table 2-16: 


Attempts at unaligned accesses are trapped. An aligned doubleword address is 
divisible by 8, an aligned word address is divisible by 4, and an aligned half-word 
address is divisible by 2. If a load or store instruction generates an improperly 
aligned address, a memory_address_not_aligned trap occurs, and the access must 
be performed piecemeal under software control. 


When performing an access, the processor generates an 8-bit Address Space Iden- 
tifier along with the address. The ASI assignments for SPARClite are shown in 
Figure 1-1 in the Overview chapter. For a normal load or store instruction, the IU 
automatically supplies an ASI of Ox0A (user data space) or Ox0B (supervisor data 
space), depending on the current operating mode of the processor. 


Privileged instructions exist for accessing the other address spaces. These instruc- 
tions supply the Address Space Indicator explicitly in their asi fields. The “register 
+ immediate” addressing mode is not available for these instructions; they cause 
an illegal_instruction trap if their i field is set to 1. 


Load 


The load integer instructions, shown in Table 2-16, copy data from memory into 
general-purpose registers. Bytes, half-words and words are copied into the desti- 
nation register r[rd]. Doublewords are copied into an even-next odd r-register 
pair. 


Load Instructions 


Load Signed Byte 
Load Signed Halfword 
Load Unsigned Byte 
Load Unsigned Halfword 
Load Word 

Load Doubleword 

















Load Signed Byte from Alternate space 
Load Signed Halfword from Alternate space 
Load Unsigned Byte from Alternate space 
Load Unsigned Halfword from Alternate space 
Load Word from Alternate space 

Load Doubleword from Alternate space 


t. privileged instruction 


Fetched bytes and halfwords are right-justified in the destination register r[rd], 
and either sign-extended or zero-extended on the left, depending on whether the 
load is signed or unsigned. 


For a doubleword load, the effective memory address is that of the most signifi- 
cant word. This word is copied into the even-numbered register r[rd]; the last bit 
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of the rd field is ignored, and should be supplied as 0. The least significant word is 
copied from the effective memory address + 4 into the following odd-numbered r 
register. A successful doubleword load operates atomically. 


Stalled instructions 
/ 


CLK 
Fetch Idd inst 1 inst 2 

Decode Idd inst 1 Inst 2 

Execute Idd Idd) inst 1 Inst 2 

Memory Idd Idd iq) inst 1 Inst 2 
Write-Back Idd Idd(q) inst 1 inst 2 

Figure 2-29. Pipeline Sequence: Load Double 

Store 


The store integer instructions, shown in Table 2-17, copy data from r registers into 
memory. Bytes, half-words and words are copied from the register r[rd]. Double- 
words are copied from an even-odd r register pair. 


Store Instructions 


STB Store Byte 
Store Halfword 
Store Word 

Store Doubleword 















Store Byte into Alternate space 
Store Halfword into Alternate space 
Store Word into Alternate space 

Store Doubleword into Alternate space 






t. Privileged instruction. 


Byte (and halfword) stores take their data from the least significant byte (or half- 
word) of the register r[rd]. 


For a doubleword store, the effective memory address is that of the most signifi- 
cant word. This word is copied from the even-numbered register r[rd]; the last bit 
of the rd field is ignored, and should be supplied as 0. The least significant word is 
copied from the following odd-numbered r register to the effective memory 
address + 4. A successful doubleword store operates atomically. 
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Table 2-18: 


Table 2-19: 


Table 2-20: 


Atomic Load-Store 


The atomic load-store instructions, LDSTUB and LDSTUBA, copy a byte from 
memory into r[rd], and then rewrite the addressed byte with the value OxFF. Inter- 
rupts and deferred traps cannot separate the load operation from the store. 


Atomic Load-Store Instructions 


LDSTUB Atomic Load-Store Unsigned Byte 
LDSTUBAT | Atomic Load-Store Unsigned Byte into Alternate space 


t. Privileged instruction. 











Swap 


The SWAP and SWAPA instructions exchange the contents of rlrd] and the 
addressed memory location. Interrupts and deferred traps are not permitted to 
intervene. 


Swap Instructions 


SWAP SWAP r register with memory 
SWAPAT SWAP r register with Alternate space memory 


t. Privileged instruction. 









2.5.6 Read and Write Control Register Instructions 


These instructions access the SPARC control and status registers. Except for SAVE 
and RESTORE, each one reads or writes the contents of an entire register. SAVE 
and RESTORE decrement and increment (respectively) the Current Word Pointer 
field of the Program Status Register. | 


Read Control Register 


Each of the instructions shown inTable 2-20 copies data from a particular SPARC 
register into the destination register r[rd]. 


Read Control Register Instructions 


| opcode | operation 


RDASRT Read Ancillary State Register 
RDY Read Y Register 


RDPSRt Read Processor State Register 
RDWIMt™ Read Window Invalid Mask Register 
RDTBRt Read Trap Base Register 





t. Privileged instruction. 
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The rs1 field of the RDASR instruction specifies which Ancillary State Register 
(ASR) is to be read. In SPARClite, only ASR16 and ASR17 are implemented. 
Attempts to read any other ASR result in an illegal_instruction trap. 





Write Control Register 


Each of the instructions shown inTable 2-21 copies data into the writable fields of 
a particular SPARC register. The data to be written is calculated as the bitwise 
XOR of the two source operands. Register r[rs1] is always one of the sources; the 
other is r[rs2] when 1 = 0, and simm13, sign-extended to 32 bits, when i = 1. 


The write control register instructions cause delayed writes. In a delayed write, the 
new value of the register is not available for some number of instructions after the 
write instruction. Table 2-21 shows the number of delay instructions for the 
SPARClite family processors. (Note: The SPARC architecture allows the number 
of delay instructions to take up to 3 cycles. If it is important to assure code com- 
patibility with all implementations of SPARC the maximum delay should be 
assumed). 


Table 2-21: Write Control Register Instructions 


opcode operation write delay 
P P (cycles) 


WRASRTt Write Ancillary State Register 




















WRY Write Y Register 

WRPSRt Write Processor State Register 
WRWIMT Write Window Invalid Mask Register 
WRTBR™ Write Trap Base Register 





+. Privileged instruction. 


Attempts to use or modify the contents of a register (except for the Y Register), 
after writing to it with a write control register instruction, have the following 
results: 


1. Writing to any field of the same register within the write delay makes the con- 
tents of that field undefined. 


Exception: A second instance of the same write control register instruction, 
even if it follows within three instructions of the first, will write the register as 
intended. 


Note that many instructions implicitly write fields (Current Word Pointer, Inte- 
ger Condition Codes) of the Program Status Register: the logical and arith- 
metic instructions whose mnemonics end in “cc”; SAVE and RESTORE; Ticc 
(when taken); and CALL. 
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2. Reading any changed field of the same register within the write delay yields an 
unpredictable value. 


Note that many instructions implicitly read fields of the PSR: ADDX, SUBX, 
MULScc, DIVScc; SAVE and RESTORE; Bicc and Ticc. 


3. If any of the two instructions following a write control register instruction 
causes a trap, a read control register instruction in the trap handler will get the 
register’s new value. 


If any of the two instructions following a WRTBR causes a trap, the Trap Base 
Address used will be the new value of the TBA field. 


If any of the two instructions following a WRPSR causes a trap, the values of 
the S and CWP fields read from the PSR while taking the trap will be the new 
values. 


WRPSR appears to write the ET and PIL fields immediately with respect to inter- 
rupts. 


If an WRPSR instruction would cause the CWP field of the Processor Status Regis- 
ter (PSR) to point to an unimplemented window, it causes an illegal_instruction 
trap instead, and does not modify the PSR in any way. 


The rs1 field of the WRASR instruction specifies which Ancillary State Register 
(ASR) is to be written. In SPARClite, only ASRI17 is implemented. Attempts to 
write any other ASR result in an illegal_instruction trap. 


Modify Current Word Pointer 


The SAVE instruction decrements the Current Window Pointer (CWP) field of the 
Processor Status Register, thus saving the caller’s window. The RESTORE instruc- 
tion increments the CWP, restoring the caller’s window. CWP arithmetic is per- 
formed modulo 8, the number of implemented windows. 


If the new CWP value corresponds to a bit of the Window Invalid Mask register 
that is set to 1, a trap is generated: the window_overflow trap for a SAVE, and the 
window_underflow trap for a RESTORE. 


If a trap is not generated, then, besides modifying the CWP, both SAVE and 
RESTORE act like integer addition instructions. The source operand fields rs1 and 
(when i = 0) rs2 are interpreted as register addresses in the old window, while 
destination field rd is interpreted as a register address in the new window. 


The SAVE instruction can be used to allocate a new window in the register file, 
and a new software stack frame in memory, in a single atomic operation. See the 
Programming Considerations chapter for details. 
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2.6 Data and Instruction Caches 


Each member of the SPARClite family contains separate data and instruction 
caches on-chip. The caches are designed for maximum flexibility of operation. 
Under software control, individual entries or entire banks can be locked. The data 
cache can be decoupled from external memory and used as a fast on-chip scratch- 
pad RAM. This section discusses the structure and operation of the caches, as 
seen from the programmer’s point of view. 





2.6.1 Structure 


In the MB86930 processor, each cache is 2 Kbytes in size, divided into 128 lines of 
4 words (16 bytes) each. The contents of the cache data memory and tag memory 
is undefined at reset. 


The cache organization, illustrated in Figure 2-30, is two-way set associative; that is, 
each address in memory can be cached in either of two locations. Each cache is 
divided into two banks, with 64 lines per bank. The 64 pairs of lines are called sets. 
On a cache access, the address bits ADR[9:4] are used to select a set; the corre- 
sponding data or instruction values can be in either bank. 
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Figure 2-30. Cache Organization 


Associated with each cache line is a tag, which indicates the memory location to 
which the line is currently mapped, and contains status information for the 
cached data or instructions. Data cache tags are located in the address space with 
ASI OxE, and instruction cache tags in the address space with ASI OxC (see 

Table 2-22). A cache entry consists of a cache line together with the corresponding 
tag. The structure of a cache tag is illustrated in Figure 2-31. 
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Address TAG 
(RST =Undefined) 


Sub Block Valid (Valid=1, Invalid=0, RST=Undefined) 
User/Supervisor (User=0, Supervisor=1, RST=Undefined) 
Least Recently Used (RST=Undefined) 

Entry Lock (Locked=1, Unlocked=0, RST=Undefined) 






Figure 2-31. Cache Tag 


Bits 31-10: Address Tag—Contains the 22 most significant bits of the memory address of the data or 
instructions cached in the corresponding line. Undefined on reset. 


Bits 9-6: Sub-Block Valid—Contains one Valid bit for each of the 4 words in the corresponding line. 
When a Valid bit is 1, it indicates that the corresponding cache word contains a current 
data or instruction value for the address indicated by the tag. Undefined on reset. 


Bit 5: User/Supervisor—indicates whether the data or instructions cached in the corresponding 
line come from user space (User/Supervisor bit = 0) or from supervisor space (User/ 
Supervisor bit = 0). Undefined on reset. 


Bits 4-3: Reserved 


Bit 1: Least Recently Used (Bank 1 Only)—Indicates, for a given set, which bank contains the 
least recently used entry. When this bit is 1, it indicates that the entry in Bank 1 was the 
least recently used. Otherwise, Bank 2 was the least recently used. The value of this bit 
determines which of the two entries is replaced when a new line needs to be allocated, 
and both entries are valid. Undefined on reset. 


Bit 0: Entry Lock—Locks the current address into the cache tag entry. An access which com- 
petes with currently locked entries in both banks of the cache is treated as non-cache- 
able. Undefined on reset. 


A faster way to set and clear the tag entry-lock bits is to write the Tag Lock Bit 
addresses as shown in Table 2-22. Writes to these locations map to the same entry 
lock bits in the instruction and data cache tags described in Figure 2-31 above. The 
advantage of writing the entry lock bit using these alternate memory locations is 
that only the lock-bit is affected on a write, the reset of the associated tag is not 
affected. The same operation using the cache tag address would require a read- 
modify-write so as not to change the rest of the tag value. 
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Entry Lock (Locked=1, Unlocked=0, RST=Undefined) J, 


Figure 2-32. Tag Lock Bit 
Bit 0: Entry Lock- Locks the current address into the cache tag entry. An access which com- 
petes with a currently locked entry in the cache is treated as non-cacheable. Writing this 
bit has the same effect as writing the corresponding bit in the cache tags except that the 
rest of the tag remains unaffected by a write to this location. 


Table 2-22: Cache Tag Addresses 
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Data Cache 
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Cache Tag 
Address 
AS|=0xC 


0x 0000 0000 
Ox 0000 0010 
0x 0000 0020 
0x 0000 0030 
Ox 0000 0040 


Ox 0000 0400 


Cache Tag 
Address 
ASI=0xE 


Ox 0000 0000 
Ox 0000 0010 
Ox 0000 0020 
Ox 0000 0030 
Ox 0000 0040 


Ox 0000 0400 


2.6.2 Operation 


This section discusses software initialization of the caches and the various cache 


Tag Lock Bit 
ASI=0x2 


Ox 0000 0000 
Ox 0000 0010 
Ox 0000 0020 
Ox 0000 0030 
Ox 0000 0040 


Ox 0000 0400 


Tag Lock Bit 
ASIi=0x3 


Ox 0000 0000 
Ox 0000 0010 
Ox 0000 0020 
Ox 0000 0030 
Ox 0000 0040 


Ox 0000 0400 
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Cache Tag 
Address 
AS|I=0xC 


Ox 8000 0000 
Ox 8000 0010 
Ox 8000 0020 
0x 8000 0030 
Ox 8000 0040 


Ox 8000 0400 


Cache Tag 
Address 
ASI=0xE 


Ox 8000 0000 
Ox 8000 0010 
Ox 8000 0020 
0x 8000 0030 
Ox 8000 0040 


Ox 8000 0400 


Tag Lock Bit 
ASI=0x2 


Ox 8000 0000 
Ox 8000 0010 
Ox 8000 0020 
Ox 8000 0030 
Ox 8000 0040 


Ox 8000 0400 


Tag Lock Bit 
ASI=0x3 


0x 8000 0000 
0x 8000 0010 
Ox 8000 0020 
Ox 8000 0030 
Ox 8000 0040 


Ox 8000 0400 





operating modes. 
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Initialization 


On reset, both caches are turned off, and all memory requests are sent to the Bus 
Interface Unit. In order to use the caches, software must initialize the Valid, Least 
Recently Used and Entry Lock bits by writing 0’s to the appropriate alternate 
address spaces. After initializing the cache, a program can write 1’s to the Cache 
Enable bits of the Cache/BIU control register to turn the caches on. Due to the 
pipeline in the IU, all writes are delayed by three instruction cycles. 


Normal Operation 


Accesses to the user and supervisor data spaces, and fetches from the user and 
supervisor instruction spaces, are generally cacheable. Stores to the instruction 
address space are not supported. Loads and stores to alternate memory spaces are 
not cacheable. I/O registers and other locations that need to be prevented from 
being cached should therefore be mapped to an alternate space. Atomic load / 
store transactions, including the SWAP instruction, are not cacheable. If an atomic 
operation references data already in cache, the entry for that data will be invali- 
dated. 


On any cacheable access, the address bits ADR[9:4] are used to select a set in the 
appropriate cache. Address bits ADR[3:2] are used to select a word from each of 
the two lines in the set; the Valid bits corresponding to those words are checked. 
The address bits ADR[31:10] are compared with the address tags. The User/ 
Supervisor bit is tested against the ASI indicated by the IU. 


A cache hit occurs if all of the following are true; otherwise, a cache miss occurs: 
e ADRI[31:10] matches the address tag in either set. 

e The User/Supervisor bit corresponds to the ASI indicated by the IU. 

e The Valid bit corresponding to the word being accessed is 1. 


In the case of a read hit, the requested data or instruction is in the cache. The data 
or instruction is returned to the IU, and the pipeline is not held up. The LRU bit is 
updated. The lock bit may be updated based on the value of the Cache Entry Auto 
Lock bit in the Lock Control Register (see Locking Modes, below). 


A read miss freezes the IU pipeline, and sends the request on to external memory. 
Though each cache line is four words long, only a single word is fetched on a 
miss. Assuming neither global nor local locking is in force, the fetched word will 
overwrite the appropriate word in one of the entries in the set. (Under global or 
local locking, a different policy is followed; see Locking Modes, below). 


Sometimes a read miss occurs only because the Valid bit for the requested word is 
not set. In this case, a cache line has already been allocated for a 4-word memory 
block which includes the requested address. The fetched word simply overwrites 
the appropriate word in this line; the Valid bit for the word is then set. 
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Otherwise, a new line needs to be allocated on a read miss, and one of the two 
entries in the set corresponding to the requested address must be selected for 
replacement. The least recently used entry, as determined by the Least Recently 
Used bit for the set, is replaced. The fetched word overwrites the appropriate 
word in this line; its Valid bit is then set, and the Valid bits for the other words in 
the line are cleared. 


The data cache follows a write-through memory update policy. On a write hit, the 
data is written both to the cache and to main memory (write-through). If there is a 
write miss, the data is written only to the external memory (no write-allocate). (A 
different policy is followed if the write is to a locked location; see Locking Modes, 
below.) 


Locking Modes 


Without locking, read misses can cause cache lines to be re-allocated. Entire 
caches, or selected entries corresponding to time-critical routines, however, can 
be locked into cache. Locked entries cannot be re-allocated. Thanks to the set- 
associative organization, one bank of each cache can continue to operate as a fully 
functional direct-mapped cache, no matter how many entries in the other bank 
are locked. 


On a read miss, if one of the entries in the addressed set is locked, the unlocked 
one is re-allocated, whether or not it was the least recently used. If both entries, or 
the entire cache, are locked, then the access will be treated as non-cacheable. 


Writes to locked data entries, moreover, are not written through to main memory. 
In this way, a portion of the data cache can be used as fast on-chip RAM which is 
not mapped to external memory. 


There are two modes of cache locking: 


¢ Global Locking — Affects an entire cache. When a cache is locked in this way, 
valid entries are not replaced; invalid words in allocated cache locations will 
be updated. Bits in the cache/Bus Interface Unit Control Register enable or 
disable the global locking mode independently for each cache. Enabling global 
locking does not affect the Entry Lock bits of individual Cache lines; when 
global locking is subsequently disabled, lines with clear Entry Lock bits are 
once again subject to re-allocation. 


e Local Locking — Affects individual cache lines. 


Bits in the Lock Control Register enable or disable, independently for each cache, 
an auto lock mode in which all subsequent cache accesses automatically set the 
Entry Lock bit of the accessed entry. Software can also lock and unlock an indi- 
vidual entry by writing the lock bit in that entry’s tag. 
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With auto-locking enabled for either the instruction or data cache, any lines 
accessed in that cache have their entry-lock bit set. This makes it easy to lock a 
routine into the cache by setting the auto lock bit in the Lock Control Register at 
the beginning of the routine and then executing the routine to lock the entries. 
The auto lock bit is cleared in one of two ways. Normally, software clears the auto 
lock bit at the end of the routine being locked. If a trap or interrupt occurs the auto 
lock bit will be cleared by hardware. This disables the locking mechanism so that 
the service routine is not locked into cache by mistake. 


Two registers are provided to make it easy to re-enable the auto locking when the 
processor returns from the interrupt. The value of the Lock Control Register 
before the interrupt is automatically saved in the Lock Control Save Register 
when an interrupt or trap occurs. To restore the correct auto-lock value on return 
from the service routine, software sets a bit in the Restore Lock Control Register. 
This will cause the value saved in the Lock Control Save Register to be moved to 
the Lock Control Register when a RETT is executed (see Figure 2-33). 


Lock Register Values 


or %g0, 0x4, %10 aes Restore Lock Control Register 

or %g0, 0x1, %g1 

sta %Q1, [%10]1 ! enable instruction auto-lock nn ce Lock Control Save Register 

Code tobe wee | | PLOT] ck Controt Register 
Trap or Interrupt. |§ ——A A NN i a 
see { [ [0] Restore Lock Control Register 

Service Routine < , 

} [TTF Cok conti Save Register 

or %g0, 0x1, %g1 

and %g1, Oxfidf, %g1 }o [0 | a 


wr %g1, %g0, %psr ! disable traps 
sta %g1, [%10]1 ! set Restore Lock bit 


=) 
Oo 
mo} 


Restore Lock Control Register 
| 0 | 1 | Lock Control Save Register 


= | 0 | O | Lock Control Register 


End of Trap or Interrupt) ©§ ———— $$$ 


33 
coi 


Restore Lock Control Register 
Code to be locked « , Lock Control Save Register 


: 1 } Lock Control Register 
or %g0, 0x0, %10 Oe] ny g 


or %g0, 0x1, %g1 
sta %g1, [%10]1 ! disable instruction auto-lock 
Restore Lock Control Register 


Lock Control Save Register 


| O | O | Lock Control Register 


Mi 


Figure 2-33. Caches 
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2.7 Interrupts and Traps 


An interrupt or trap (other than reset) causes a vectored transfer of control 
through a trap table which contains the first four instructions of each service rou- 
tine. The Trap Base Address field in the Trap Base Register contains the base 
address of the table. Associated with each trap type is an 8-bit number, which 
(left-shifted by 4 bits) is used as an offset into the table. From the trap table, con- 
trol typically passes (via a JMPL instruction) to the appropriate trap handler. The 
control transfer for traps other than reset and breakpoint traps is illustrated in 
Figure 2-34. Reset always traps to address 0 and breakpoints always traps to 





Ox000003F0. 
trap/interrupt (in) trap/interrupt (out) 
initialized by kernel | 
Trap Base Address (high 20 bits) tt (trap type) 0 0 0 O|} TBR 













bl 
Trap table TBA 


tt * 16 (bytes) 





instruction 1 


instruction 2 


instruction 3 


instruction 4 


Trap handler 
routine 


i 2 


Figure 2-34. Trap and Interrupt Vectoring 





A feature called single vector trapping allows all traps to vector to a single location, 
specified by the 20 high-order bits of the TBR, filled out on the right with 0’s. 
After the trap is taken, the trap type can be determined by reading the tt field of 
the TBR. Single vector trapping can save code space and improve the response 
time of traps, since all of the trap service routines can potentially fit in cache. This 
feature, disabled at reset, can be enabled by setting the SVT bit of ASR17. 


The Trap Enable bit (ET) of the Processor State Register enables (ET = 1) and dis- 
ables (ET = 0) interrupts and traps. When ET = 0, interrupts are ignored, and traps 
cause the Integer Unit to halt and enter the error mode. 


The processor provides direct support for 15 interrupt priority levels. The exter- 
nal interrupt request level (on input pins IRL[3:0]) is compared with the value in 
the Processor Interrupt Level field of the PSR. If the request level equals 15, or if it 
exceeds the PIL value, the interrupt is taken. 
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2.7.1 Trap Types 


Up to 256 trap types can be distinguished on the basis of the 8-bit trap type num- 
ber. Of these, half are reserved for external interrupts and hardware-enforced 
instruction exceptions. The various trap types are listed in order of priority, with 
their causes, in Table 2-23. 


Table 2-23: Traps 


ep i ety [we SC~s se 


rr i ie The external system asserted the -RESET input, 
2 1 


signalling a reset request. Alternatively, the processor 
instruction_access_exception 







entered error mode and so generated an internal reset. 





A blocking error exception occurred on an instruction 
access (for example, an MMU indicated that the nage 
was invalid or read-protected). 


privileged_instruction 3 2 An attempt was made to execute a privileged instruction 
in user mode. 


illegal_instruction 4 3 An attempt was made to execute an instruction with an 
unimplemented opcode, or an UNIMP instruction, or an 
instruction that would result in illegal processor state (for 
example, writing an illegal CWP into the PSR). Note that 
unimplemented FPop and unimplemented CPop 
instructions generate fp_exception and cp_exception 
traps. 


fp_disabled 5 4 An attempt was made to execute an FPop, FBfcc, or a 
floating-point load/store instruction. 

cp_disabled An attempt was made to execute a CPop, CBccc, or a 
coprocessor load/store instruction. 


36 
128-254 | A Ticc instruction was executed and the trap condition 
evaluated to true. 


mem_address_not_aligned | 
Instruction or Data Breakpoint encountered. 




















A SAVE instruction attempted to cause the CWP to point 
to a window marked invalid in the WIM. 


A RESTORE or RETT instruction attempted to cause 
the CWP to point to a window marked invalid in the WIM. 














A load/store instruction would have generated a memory 
address that was not properly aligned according to the 
instruction, or a JMPL or RETT instruction would have 
generated a non-word-aligned address. 







data_access_ exception A blocking error exception occurred on a load/store data 
access. (For example, an MMU indicated that the page 


was invalid or write-protected). 


A TADDccTV or TSUBccTV instruction was executed, 
and either arithmetic overflow occurred or at least one of 
the tag bits of the operands was nonzero. 










tag_overflow 


trap_instruction (Ticc) 


breakpoint trap 






5 
7 
10 
14 
12 
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Table 2-23: Traps ee 


a 


interrupt_level_15 
interrupt_level_ 14 
interrupt_level_ 13 
interrupt_level_ 12 
interrupt_level_ 11 
interrupt_level_ 10 
interrupt_level_9 External Interrupt Request 
interrupt_level_8 
interrupt_level_7 
interrupt_level_6 
interrupt_level_ 5 
interrupt_level_ 4 
interrupt_level_ 3 
interrupt_level_ 2 
interrupt_level_1 





2.7.2 Trap Behavior 


The expression trapped instruction refers, in the case of a synchronous trap 
(instruction exception), to the instruction which caused it. In the case of an inter- 
rupt, the trapped instruction is the one which was about to enter the Writeback 
stage of the pipeline when the interrupt occurred. 


The Integer Unit supports precise traps—when an interrupt or trap occurs, the 
saved state of the processor reflects the completion of all instructions prior to the 
trapped instruction, but no subsequent instructions (including the trapped 
instruction). Hardware guarantees that upon return from the service routine, the 
Program Counter points to the trapped instruction (or its successor if the trapped 
instruction was emulated). 


The integer unit tests for exceptions generated by an instruction just before that 
instruction enters the Writeback stage. If an exception is detected, and no higher- 
priority request is pending, and traps are enabled, the processor takes a trap. If 
more than one exception is detected, the processor takes the trap with the highest- 
priority. When a trap is taken, the processor does the following things: 

1. Writes the trap type number into the tt field of the Trap Base Register. 


2. Saves the current processor mode (user or supervisor) by copying the value of 
the S bit of the Processor Status Register into the PS bit. 


3. Enters supervisor mode by setting the S bit of the PSR to 1. 
4. Disables traps by clearing the ET bit of the PSR to 0. 


5. Saves the window of the interrupted routine by decrementing the Current 
Window Pointer (modulo 8). The Window Invalid Mask is not checked for 
window underflow or overflow. 
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6. Stores the current Program Counter and Next Program Counter values in r[17] 
and r[18] of the new window. 


7. Transfers control to the address specified by the TBR. 


An instruction is said to be squashed when its execution is aborted after it has 
entered the pipeline. A taken trap always squashes either 2 or 3 instructions. 
Asynchronous traps and interrupts squash 3 instructions as shown in Figure 2-35. 
Software traps (Ticc) only squash 2 instructions because the processor holds the 
next instruction fetch when the trap instruction reaches the memory stage (in 
Figure 2-35, instruction 4 is replaced by a hardware generated NOP). 


synchronous or asynchronous trap 


, list trap handler instruction 





















































clk | | 
Fetch Inst 1 | Inst 2 | Inst 3 Inst 4/nop | Inst 20 Inst 21 
Decode | | Inst 1 | Inst 2 Inst 3 | Inst 4/nop Inst 20 Inst 21 
Execute | | | Inst 1 | Inst 2 Inst 3 Inst 4/nop)} inst 20 Inst 21 
Memory | | | : Inst 1 Inst 2 Inst 4/nop | Inst 20 Inst 21 
Write-Back | | | | | Inst 3 Inst 4/nop)| —_ Inst 20 
no result written back to register squashed instructions 


file, however PC is written back 
Figure 2-35. Instructions Squashed by Trap 


The trap handler must insure that a window is available (for taking another trap), 
and then re-enable traps by setting ET to 1. The code for handling the exceptional 
condition that caused the trap can then be executed. Traps must be disabled (ET 
cleared to 0) before returning, via a RETT instruction, from the service routine. 


Unless it causes a trap, the RETT instruction does four things: it increments the 
Current Word Pointer (modulo 8), causes a delayed control transfer to a register- 
indirect target address, restores the processor to the operating mode (user or 
supervisor) it was in before the trap was taken, and enables traps. The trap han- 
dler must ensure that a window is available so that RETT can increment the CWP 
without causing a window underflow and sending the processor into error mode. 


2.7.3 Reset and Error Modes 


As defined in the SPARC architecture, the SPARClite integer unit has reset, error, 
and execute modes which are states of the processor. The processor is in execute 
mode during the normal execution of instructions. The processor enters error 
mode if a synchronous trap is encountered while the traps are disabled (the ET bit 
is 0). The processor enters reset mode when the —RESET input is asserted, and 
enters execute mode when the —RESET line is de-asserted. 
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Once it is in error mode, the processor must be reset in order to return to normal 
operations. The external system can detect an error condition by monitoring the 
—ERROR signal which is asserted for a minimum of one cycle. 





Processor reset occurs whenever the -RESET input is held active for 4 cycles after 
the clock stabilizes. Reset does the following: 


1. Writes 0 into the Program Counter and 4 into the Next Program Counter. 
When -RESET is de-asserted, the processor will begin fetching instructions at 
address 0x00000000 in supervisor instruction space (ASI 0x09). 


2. Zeroes or sets to the appropriate NOP instruction all registers in the instruc- 
tion pipeline. This insures that: 


e No instructions are left half-executed in the instruction pipeline. 
e No traps are taken prior to the instruction at address zero. 
¢ No control transfer instructions are in progress. 


e No interlock or bypass conditions will be detected prior to the instruction at 
address zero. 


e No state will be written back prior to the instruction at address zero. 
3. Enters supervisor mode by setting the S bit in the PSR. 
4. Disables traps by clearing the ET bit in the PSR. 


2.8 Debug Support Unit 


The Debug Support Unit (DSU) supports target monitors and hardware emula- 
tors with on-chip breakpoint and single-step logic. To be available for use, the 
DSU must be enabled when the processor is reset. The signals used to configure 
the DSU during reset are discussed below. 


A dedicated emulator bus is extended off-chip from the DSU. This bus allows 
transactions between the IU and cache to be monitored by external hardware. In- 
circuit emulators and other debug and diagnostic hardware can monitor this bus 
to trace processor activity. 


This section discusses the breakpoint logic of the DSU. (For more information on 
in-circuit emulation of SPARClite designs, refer to the documentation provided 
with your emulator.) 


2.8.1 Breakpoint Registers 


There are six on-chip Breakpoint Descriptor Registers, two for Instruction 
Addresses, two for Data Addresses, and two for Data Values. A Debug Control 
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Register (Figure 2-36) and a Debug Status Register (Figure 2-37) control the opera- 
tion of the breakpoint logic, and reflect its current status. 


24 23 16 15 14 13 9 8 7 6 5 4 3 2 


Bits 31-11: 


Bit 23-16: 
Bit 15: 


Bit 14: 


Bit 13-9: 
Bit 8: 


Bit 7: 


Bit 6: 


Bit 5: 


Bit 4: 


Bits 3-2: 


User/Supervisor Bit for Data Address 2 
User/Supervisor Bit for Data Address 1 
Enable Data address 2 break 

Enable Data address 1 break 

Enable Instruction address 2 break 
Enable Instruction address 1 break 
Single_Step 

Data Value Transaction Type 

Data Value Condition 


Data Valin AA anle 
waa ValuG IVIGSK 





Figure 2-36. Debug Control Register 


Data Address 2 ASI: Specifies the ASI match value for Data Address 2. 
Data Address 1 ASI: Specifies the ASI match value for Data Address 1. 


Data Address 2 User/Supervisor Bit: Specifies either a User or Supervisor Mode match 
for data address 2. 


Data Address 1 User/Supervisor Bit: Specifies either a User or Supervisor Mode match 
for data address 1. 


Reserved. 


Enable Data Address 2 Break—Enables (1) or disables (0) the breakpoint comparison for 
Data Address Descriptor 2. 


Enable Data Address 1 Break—Enables (1) or disables (0) the breakpoint comparison for 
Data Address Descriptor 1. 


Enable Instruction Address 2 Break—Enables (1) or disables (0) the breakpoint compari- 
son for Instruction Address Descriptor 2. 


Enable Instruction Address 1 Break—Enables (1) or disables (0) the breakpoint compari- 
son for Instruction Address Descriptor 1. 


Single Step—Enables single-step operation when set. During single-step operation, a 
breakpoint trap is issued on every instruction. 


Data Value Transaction Type—Determines the class of instructions (loads, stores, or 
both) that can cause a Data Value breakpoint trap. 


Break only on Loads 
Break only on Stores 


Break on Load or Store 
Break Always 
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Bit 1: Data Value Condition—Determines whether a Data Value breakpoint trap is caused by 
values inside the range specified by the Data Value Descriptor Registers, or outside this 
range (assuming that the Data Value Mask bit is 0.) 


Bit 0: Data Value Mask—Controls the interpretation of the Data Value Descriptors. When the 
Data Value Mask bit is 1, Data Value Descriptor 2 is used as a mask for Data Value 
Descriptor 1. When the Data Value Mask bit is 0, the Data Value Descriptors specify the 
upper and lower bounds of an address range. 


31 6 5 4 3 2 1 0 





Data Address 2 Match 
Data Address 1 Match 
Instruction Address 2 Match 
Instruction Address 1 Match 

EMU_ENBL_on Reset 
EMU_BRK_on Reset 





Figure 2-37. Debug Status Register 


Bits 31-6: Reserved 


Bit 5: Data Address 2 Match—set to (1) if address matched. Software should clear this bit after 
reading it. 

Bit 4: Data Address 1 Match—set to (1) if address matched. Software should clear this bit after 
reading it. 

Bit 3: Instruction Address 2 Match—set to (1) if address matched. Software should clear this bit 
after reading it. 

Bit 2: Instruction Address 1 Match—set to (1) if address matched. Software should clear this bit 
after reading it. 

Bit 1: —EMU_ENBL Asserted on Reset—Set on reset if the -EMU_ENBL input is asserted; 


cleared on reset otherwise. Maintains its value until the next reset. -EMU_ENBL and 
EMU_BRK are used to configure the DSU on reset. This bit is read only. 


Bit 0: EMU_BRK Asserted on Reset—Set on reset when the EMU_BRK input is asserted; 
cleared on reset otherwise. Maintains its value until the next reset. -EMU_ENBL and 
EMU_BRK are used to configure the DSU on reset. This bit is read only. 
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The breakpoint descriptor and control registers are memory-mapped to ASI 0x1; 
their addresses are listed in Table 2-24. 


Table 2-24:Memory Locations of Debug Registers 





















OxOO00FF14 Data Value Descriptor Register 2 or Mask Register 
Ox0000FF18 Debug Control Register 


OxCOO0FFIC | Debug Sia 








WNW 


2.8.2 Breakpoint Traps 


Breaks in code execution can be caused by pre-setting a break condition in one of 
the breakpoint descriptor registers, or by setting the Single Step bit in the Debug 
Control Register. Do not attempt to use the breakpoint registers while using an 
emulator for system debugging. 


The breakpoint traps have trap type number 255, and a priority less than the other 
synchronous traps, but greater than trap instructions or external interrupts. When 
a breakpoint trap is recognized by the IU, it branches to address 0x000003F0 
regardless of the value of the TBA field in the Trap Base Register. 


Each of the Address Descriptor Registers specifies a break address. If the address 
of an access matches the register contents, a breakpoint trap occurs. There is one 
bit in the Debug Control Register associated with each Address Descriptor Regis- 
ter, which enables or disables the breakpoint comparison for that register. 


The Data Value Descriptor Registers work in either of two ways. If the value of 
Data Value Mask bit in the Debug Control Register is 1, then Data Value Descrip- 
tor 2 is used as a mask for Data Value Descriptor 1. In this mode only those bits of 
the Data Value Descriptor 1 are compared, for which the mask bit is 1. All other 
bits are ignored in the breakpoint comparison. 


If the Data Value Mask bit is 0, the Data Value Descriptors 1 and 2 act as the lower 
and upper bound respectively, for a range comparison. The break condition is 
determined by the values of the Data Value Condition bit in the Debug Control 
Register. If the Data Value Condition bit is a 0, then the break condition is given 
by the expression: 


Data Value Descriptor 1 < Accessed Value < Data Value Descriptor 2 
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If the Data Value Condition bit is a 1, this break condition is inverted, turning the 
comparison into an “out-of-range” test. 


The Data Value comparison may be conditioned by the type of transaction (load 
or store) that is being performed. The following encoding of the Data Value 
Transaction Type bits in the Debug Control Register is used: 


Break only on Loads 
Break only on Stores 


Break on Load or Store 
Break Always 





The chip always ANDs the results of the Data Address 1 comparison and the Data 
Value comparison. To break on all data address matches, use the Break Always 
condition. 


It is the responsibility of monitor code to restore all register window values (with 
the exception of the breakpoint trap window) to their pre-break values before 
returning from the trap. 


2.8.3 Configuration at Reset 


The initial configuration of the DSU is determined by the values on the -EMU_ 
ENB and EMU_BRK input pins during the reset, as shown in Table 2-25. 


Configuration of the Debug Support Unit at Reset 


ae] aes eure Teme | FRM 


Debug Registers are cleared on RESET; breakpoint registers 
are enabled. 

Debug Registers are cleared on RESET; all breakpoints are 
disabled. 












2.9 SPARC Compliance 


SPARClite processors are fully compliant with the SPARC architectural specifica- 
tion. 


Compatibility with existing and planned SPARC standards is a cornerstone of the 
SPARClite family strategy. 
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Compatibility assures: 


1. a wide range of silicon implementations meeting different price/ performance 
targets. 


2. aready availability of native development environments and tools 


3. a large and growing base of application software which is object code compat- 
ible 

4. an established and commerically viable processor architecture which is likely 
to be around well into the future. 


The SPARC architecture was originally developed by SUN Microsystems, Inc. 
and first implemented by Fujitsu. SPARC International has since been formed to 
independently promote and control the evolution of the architecture. 


All SPARC processor implementations conform to one of two architecture revi- 
sion levels. The first commercially available version of the architecture is referred 
to as SPARC architecture Version 7. All existing silicon implementations and con- 
sequently SUN Microsystems, Inc. SPARCstations™ (1, 1+, 2, SLC, ELC, IPC, IPX) 
and SPARC compatible workstations conform to Version 7. A revised version of 
the SPARC architecture, Version 8, became final in March 1991. Future SPARC 
workstations will migrate to SPARC Version 8 processors. All OS and application 
code written for Version 7 processors will run without modification on SPARC 
Version 8 processors. SPARClite series processors conform to Version 8 of the 
SPARC Architecture. 


Version 8 of the SPARC Architecture adds these primarily features to Version 7. 
¢ multiply- integer multiply instruction 
e divide- integer divide instruction 


e write/read ASR- read and write Ancillary State Register instructions which 
are used as additional control registers and implementation definable control 
registers 


The architecture does not require that all instructions and features be imple- 
mented, only that the processor will trap on unimplemented features so that they 
can be emulated in software. SPARClite implements the Version 8 multiply 
instruction and read and write ASR instructions. The integer divide instruction is 
not directly supported in hardware. 


The MB86930 implements two instructions not defined by SPARC Version 8. 
These are the Scan and Divide Step instructions. These instructions are decoded 
in unused opcodes and provide a superset of SPARC Version 8. If code developed 
using these instructions is run on Version 7 or Version 8 SPARC processors other 
than SPARClite an unimplemented instruction trap will occur. 
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Internal Architecture 


The internal architecture of SPARClite family processors is illustrated in 

Figure 3-1. The processor consists of a Clock Generator, an Integer Unit, separate 
on-chip caches for data and instructions, a Bus Interface Unit, and a Debug 
Support Unit to support the use of in-circuit emulators and target monitors. Inter- 
nally, the various functional units are connected by separate instruction and data 
buses. For connection with external memory and I/O, a unified address bus and a 
unified data bus are extended off-chip. This chapter discusses the individual 
functional units in turn, giving an overview of the flow of data and control signals 
through the processor. 


Internal Architecture - 


3-1 











SPARClite User’s Guide 




























CLOCK 
GENERATOR 


CLK_OUT 
SPARC INTEGER UNIT 










be 
) Z 
DATA = 
BUS Oo 
ADDRESS & INTERFACE a ac 
UNIT 3 
asi’ g 
Lu 
eae, . 
: 
crip SEL 
PAGE_DET 


ADDRESS 


Mm mmr ost =a te 
eis 









i 
EF 
Be 


Ree 


2K DATA 
CACHE 


Figure 3-1. Internal Architecture (Block Diagram) 


3.1 Integer Unit 


The Integer Unit (IU) is a compact, fully custom implementation of the SPARC 
architecture. It is hard-wired for maximum performance; that is, it uses no micro- 
code. It contains three functional units: 


e Instruction Block—Contains the instruction pipeline; decodes instructions into 
control signals for the other blocks. 


e Address Block—Performs all instruction-address manipulations. 


e Execute Block— Performs all data manipulations; generates operand addresses 
for load and store instructions and effective addresses for some of the control 
transfer instructions. 


As shown in Figure 3-2, the IU is based on a Harvard (Aiken) architecture. There 
are separate address buses for instructions and data. There are also two 32-bit 
data interfaces: the instruction data bus, and the data bus. The use of these four 
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buses allows the IU to retrieve data and instructions simultaneously from on-chip 
cache. | 


IDATA 


REGISTER FILE 
read 2 read 3 





| ADDRESS DADDRESS DDATA 


Figure 3-2. Integer Unit Data Path 


3.1.1 | Block 


The instruction block (I Block) contains the five-stage instruction pipeline and the 
logic which decodes instructions into control signals for the rest of the IU. The 

I block detects all bypass and interlock conditions. 

The main interfaces to the I block are: 

e Instruction data bus from the instruction cache or main memory. 


e¢ Immediate data field which goes to the A block for computing PC relative 
control transfers, and to the E block to be used as immediate data. 


e Control signals to the A block and E block, including the register file read and 
write addresses, register enable signals, multiplexer controls, and partly or 
fully decoded operation codes for the ALU /Shifter. 


e Status signals back from the E block, including possible trap conditions such 
as memory_address_not_aligned or tag_overflow. 
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Instruction Pipeline 


The IU implements a five-stage instruction pipeline to allow a sustained execu- 

tion rate of nearly one instruction per cycle. The operation of the pipeline under 

ideal conditions is illustrated in Figure 3-3. The pipeline consists of the following 

stages: 

1. Fetch (F)—One of the instruction memory spaces is addressed and returns an 
instruction. (The figure below assumes a hit in the instruction cache.) 


2. Decode (D)—The instruction is decoded; the register file is addressed and 
returns operands. 


3. Execute (E)—The ALU computes a result. 


4. Memory (M)—External memory is addressed (for load and store instructions 
only; this stage is idle for other instructions). 


5. Writeback (W)—The result (or loaded memory datum) is written into the 
register file. 


CLK |, 










Fetch 
Decode 
Execute | Instruction 3 


| 4 

Memory Instruction 2 | 3 
| 2 
H 





Write-Back Instruction 1 


Figure 3-3. Instruction Pipeline 


No instructions execute out-of order; that is, if instruction A enters the pipeline 
before instruction B, then instruction A necessarily reaches the writeback stage 
before instruction B does. 


The control logic for the instruction pipeline is illustrated in Figure 3-4. At each 
cycle a horizontal control word is available which is wider than 32 bits and con- 
trols every multiplexer, latch-enable, and unit op-code in the chip. The horizontal 
control word is composed of control signals active during the decode stage of 
instruction N, the execute stage of instruction N-1, the memory stage of instruc- 
tion N-2 and the writeback stage of instruction N-3. Some control bits require no 
decoding and are simply hardwired from the appropriate bits in the instruction 
register. Because the SPARC instruction set is not completely orthogonal (not 
every instruction field has the same meaning in every instruction) most bits 
require some decoding based on a single instruction in the pipeline. Some control 
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bits require decoding using logic that looks at two instructions in the pipeline, as, 
for example, in controlling multiplexers to select data bypass paths. 


Instructions 


a 
Writeback 


Figure 3-4. Instruction Pipeline Control Logic 















Horizontal 
Control Word 


Combinational 
Logic 








Pipeline Hold 


The IU does not complete one instruction on absolutely every cycle. On a load 
instruction, for example, external memory may be slow in returning the requested 
data. Because the IU does not execute or complete instructions out of order, the 
pipeline must be held up until the requested data is returned. Only then can the 
instruction complete and only then can the subsequent instructions continue. 


There are also some hazards built into the IU datapath which require interrupting 
the one-cycle-per-instruction sequence of the pipeline. For example, a double- 
word load cannot be performed in one cycle because there is not enough memory 
or register-file bandwidth to move the data through the datapath. Another exam- 
ple is a load to a register which is followed by an instruction which uses that 
register. Because the operand of the second instruction is required in the decode 
stage but is not available, this instruction must be delayed until the operand is 
available. 


Conditions which hold up the processor pipeline are handled uniformly by the 
I Block control logic and are referred to as hold conditions. A complete list of possi- 
ble hold conditions is given in Table 3-1. 


Conditions Which Cause a Pipeline Hold 


[Name [~_Deseripion ‘| Pipeline Stage | Instruction affected _ 
instruction that is not yet available. 

dhoid | Dataisnotyetavaiable —~«|_— Memory ‘| Loads and-Stowes 
Multiplication in progress 
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Table 3-1: Conditions Which Cause a Pipeline Hold 







Interlock An instruction in the pipeline must Load/Use and 
wait for some prior instruction to be CALL/Use r15 
completed (through Writeback). Instruction Pairs 







Multicycle An instruction which inherently Execute Load and Store 
Instruction requires more than one cycle is in the Double-word, Atomic 
pipeline Load/Store 


The interlock conditions are: 


e Load/Use Instruction Pairs—lIf a load instruction which has rd=N as its 
destination register is followed by an instruction which uses rs=N as one of its 
source operands, then the load must proceed through Writeback before the 
following instruction can enter the Execute stage. 


e CALL/Use %r15 Instruction Pairs—Similarly, since the CALL instruction 
implicitly writes the current value of the PC into r15, it must proceed to 
Writeback before any following instruction which uses r15 can enter the 
Execute stage. 


Any time an interlock is detected, a NOP is inserted into the pipeline. The address 
block is signaled, so that the address of the instruction which causes the interlock 
is replicated in the address pipe. The NOP itself cannot cause a trap. 


The multicycle instructions are LDD, LDDA, STD, STDA, LDSTUB, LDSTUBA, 
SWAP, and SWAPA. When a multicycle instruction enters the Execute stage, it 
and the instruction in the d_ir register are frozen for an additional cycle. 
Although it is possible to detect a multicycle instruction while it is in the Decode 
stage (unlike interlocks, which cannot be detected without looking at two instruc- 
tions, those in the d_ir and e_ir registers), the I Block allows it to progress to the 
Execute stage before a hold is generated and inserted. This simplifies control 
somewhat because there are fewer points at which the pipeline must be held. 


Note that the maximum number of internally generated hold cycles an instruction 
can cause is two, as in the following case: 


LDD [%$r1+%r2],%0r4 
ADD %r5,%r5,%r6 


The LDD takes two cycles, and it generates an interlock because the next instruc- 
tion uses the data loaded in the second data memory cycle of the LDD instruction. 


When a hold condition occurs, combinational logic generates one or more freeze 
signals, which prevent latches from being updated, and hence keep the pipeline 
from advancing. For some holds—dhold, for example—the entire pipeline is 
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frozen, with freeze signals being generated for all stages in the pipeline. For other 
holds—interlock conditions, for example—later stages in the pipeline must 
advance for the hold condition to be resolved. Thus only the earlier stages of the 
pipeline are frozen. 


Trap Logic 





SPARClite supports precise traps; that is, when a trap occurs, the saved program- 
mer-visible state of the processor reflects the completion of all instructions prior 
to the trapped instruction, and no subsequent instructions including the trapped 
instruction. Thus, when an instruction causes a trap, one of two statements is true: 


e No results from that instruction have been written into the programmer- 
visible registers (the register file or the PSR, TBR, WIM, or Y registers). 


e Or, if data has been written into a programmer-visible register, the data 
contained in that register prior to being written by the trapped instruction is 
saved by the processor and can be restored when the trap is taken. 


Table 3-2 shows the pipeline stages in which the various trap conditions are 
detected. 


Table 3-2: Detection of Trap Conditions 


I reset (hardware reset) 


reset 
instruction_access_exception 
illegal_instruction 
priv_instruction 
illegal_instruction 


fp_disabled 
cp_ disabled 
window_ overflow 
window_underflow 













Priority Trap Type 









7 -otslnn| 
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mem_address_not_aligned 
data_access_ exception 










10 tag_overflow 
128-254 trap_instruction (Ticc) 
255 instruction_breakpoint 


255 data_breakpoint 









interrupt_level_ 15 
interrupt_level_ 14 
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interrupt _level_1 
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As shown in Table 3-2, the latest stage in which a trap can be detected is the Mem- 
ory stage (a data memory exception for a load or store). If a programmer-visible 
register is updated prior to this stage, its original contents must be restored when 
and if the trap is taken. 


Due to the pipelined operation of the IU, a trap condition for one instruction may 
actually be detected before a trap condition for a prior instruction. Thus, it is nec- 
essary to align the detected trap conditions so that all trap conditions for instruc- 
tion N are considered together, before considering any trap conditions resulting 
from instruction N+1. 


The trap coder is illustrated in Figure 3-5. Its purpose is to align in time the (possi- 
bly multiple) trap sources for a single instruction, to determine if a trap is to be 
taken or not, and if so, to determine the highest priority trap and code its trap 


type. 


Fetch-stage trap sources 
Decode-stage trap sources 


Execute-stage trap sources 






Memory-stage trap sources 






Combinational Block 


: qualify, prioritize, encode id 


Figure 3-5. Trap Coder 


trap? yes/no 


Memory-stage 
instruction reg 





trap type 
(to A block) 





When a trap is taken, the trap type field goes to the A Block where it is used 
immediately as a trap target address (when concatenated with the Trap Base 
Address) and is latched into the Trap Base Register. 


3.1.2 A Block 


The A Block contains the address pipeline. Along with the E Block, it is responsi- 
ble for all instruction-address manipulations. The A Block executes the CALL and 
Bicc instructions. The A Block and E Block are used together to execute the JMPL, 
Ticc, and RETT instructions; in these cases, the A Block controls the update of the 
Program Counter. The A Block’s main interface to the rest of the chip outside the 
IU is the instruction address bus. 
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The address pipeline is illustrated in Figure 3-6. The fetch-stage program counter 
(PC) is used to address instruction memory via the instruction address bus. 
Because a CALL, JMPL, or trap may require that the address of an instruction be 
written back to the register file, the address of every instruction tracks the instruc- 
tion itself in the instruction pipeline so that it is available in the memory stage if it 
needs to be written back to the register file. These address pipeline registers are 
the decode, execute, and memory program counters. Each of these registers con- 
tains the address from which the instruction in the corresponding instruction 
register was fetched. 


trap type 
(from | Block) writable 


aaa 
immediate data Es 
(30 bits) a ae 
readable 
d inc (+4 t dd 
yaAaanpar cal es ‘o” 


(from E Block) 
[ie | 


this path used 
for multicycle 
instructions 


e pc 
Eee 
m_pe 
instruction address return address 
(to instruction memory) (to E Block) 


Figure 3-6. Address Pipeline 


The PC has five possible sources: 
1. +4 incrementer, for normal, sequential instruction fetch. 


2. The address adder, for PC-relative control transfer (Bicc or CALL instruction). 
The immediate data field contains offset information and comes from the 
I Block. 
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3. The jump address for a JMPL or RETT instruction. The jump address bus 
contains jump target information, and comes from the E block by way of the 
register file and ALU. 

4. The TBR, concatenated with the trap type (tt) or with zeroes (when Single- 
Vector Trapping is enabled), on a Ticc instruction or an interrupt or trap. The 
trap type comes from the trap priority encoder, part of the I Block; when 
concatenated with TBR[31:12], it gives the target address for a trap. 


5. Zeroes, concatenated with the trap type, for reset. 

Note that “+4” is used to indicate that the (byte) address is incremented by 4 to 
fetch the next instruction. In reality, the two least significant bits of the address 
are not implemented in hardware because they are never used. Word alignment, 


for the case of a jump address coming from the E Block is verified in the E Block 
(and to some extent, the I Block). 


The return address bus is written back to the register file in the case of a CALL, 
JMPL or Trap. 

Several control signals come from the I block. These include: 

e PC input-select signals which control the PC input multiplexer. 


e The address adder control signal, which determines whether a 30-bit or a 22- 
bit immediate address field is added to the previous value of the PC (now 
found in the decode-stage PC). 

e Pipeline freeze signals which can prevent the updating of registers in the 
pipeline when a hold condition is detected. 


3.1.3 E Block 


The E Block is responsible for all IU data manipulations. It generates operand 
addresses for load and store instructions and effective addresses for some of the 
control transfer instructions. 


As shown in Figure 3-7, the E Block contains the Store Align Unit (SAU), the Load 
Align Unit (LAU), the Register File (RF), and the Adder, Shift, and Logic Unit 
(ASLU). The E Block also contains the result bypass logic that determines which 
operands are driven into the ASLU, and the store bypass logic that determines 
what data is latched for stores. 
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D ADDRESS DDATA 


Figure 3-7. Execute Block 


Adder, Shift, and Logic Unit (ASLU) 


The ASLU incorporates an integer adder, a barrel shifter, a logic unit, and a scan 
unit. The integer adder calculates the results of the addition, subtraction, multi- 
ply-step, and divide-step instructions, and generates the carry, overflow, nega- 
tive, and zero condition code values. It is used in load and store operations to 
calculate effective data addresses, and in register-indirect control transfers to cal- 
culate the new address to be placed in the PC register of the A Block. The integer 
adder also serves the multiplication unit by adding the “sum” and “carry” vectors 
during integer multiplications. The barrel shifter /logic unit executes the logic and 
shift instructions. The scan unit exists solely to support the scan instruction. 


Results from the integer adder, the barrel shifter, the logic unit, and the scan unit 
are multiplexed into the R (Result) Register. Results from the integer adder are 
also made available to the Y Register. 


Register File 


The register file contains 136 registers of 32 bits each. The organization of these 
registers into windows is discussed in the Programmer’s Model chapter. The regis- 
ter file has one write port and three read ports. The write port is used for the 
instruction destination register (denoted rd in instruction descriptions). Two of 
the read ports are used for the two instruction source registers (s1 and rs2). The 
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remaining port is used for the data to be stored when a store or swap instruction 
is executed. In this way, even store instructions can be executed in a single cycle. 


The register file also contains the address decoders for all four ports. Each address 
presented to the decoders consists of 8 bits derived from an instruction field and 
the Current Window Pointer. These are physical addresses into the register file 
memory array. 


Bypass Logic 


As shown in Figure 3-7, the A and B operand registers have inputs which come 
from sources other than the register file or immediate data bus. These inputs are 
results from previous instructions which have not yet written back to the register 
file. There are two such bypass paths in the E Block: 


e Result Bypass—The result of an ALU operation in the R register is written back 
to the A or B operand register in the Memory stage of the following ALU 
operation. 


e Write Bypass—The data in the W register is written to the A or B operand 
register, in the Writeback stage. 


The result bypass path is selected when one instruction generates a result that can 
be used by the immediately following instruction. More precisely, if an instruc- 
tion in the Decode stage of the pipeline has rs1 = N and the instruction in the 
Execute stage has rd = N, the rs1 operand will not come from the register file, but 
directly from the R register in the ALU through the result bypass. Since an inter- 
vening SAVE or RESTORE instruction may have changed the Current Word 
Pointer, it is the physical addresses of the register source and destination which are 
compared, not the logical addresses (which depend on the CWP). 


As an example, consider the instruction sequence: 


add %r1,%r2,%r3 fl. se Ee eS Ss 
add %r3,%r4,%r5 e 73 + 4 => 5 


The second add instruction takes its A source operand not from the register file 
but directly from the result of the ALU, through the result bypass. 


The write bypass is selected when an instruction in the Decode stage has rs1 = N 
and the instruction in the Memory stage has rd = N. In this case, the rs1 operand 
will not come from the register file, but from the W register through the write 
bypass. In the following instruction sequence, the third instruction uses the write 
bypass as its A source operand: 
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add %r1,%r2,%r3 * 2 pop? > as 
add %r4,%r5,%r6 eo eae ae ea. EG 
add. $473,357] ,2r8 eS CP e> rs 


If both bypass conditions apply, the result bypass takes precedence. 


There is a third bypass path, called the store bypass. It can be seen in Figure 3-7. 
The register file has a dedicated store port which is used for reading the rd regis- 
ter of a store instruction; this register contains the data to be stored. The store port 
is read in the Execute stage of the store. When a store and the immediately pre- 
ceding instruction access the same rd register, a bypass from the Writeback stage 
of the preceding instruction to the Memory stage of the store is needed. In the 
code sample below, the result of the first instruction becomes available to the 
Memory stage of the store by means of the store bypass path. 


add $r4, %r5, Sr6 EAA ey SS eS 
st @r4,%3r5, %Sr3 ; r3 -> mem[r4 + r5] 


Branch Evaluation Logic 


The branch evaluation logic, which forms part of the E Block, evaluates branch 
conditions based on the current values of the integer condition codes of the PSR 
register. The icc bits n (negative), z (zero), c (carry) and v (overflow) form part of 
the branch evaluation block. The interpretation of these bits is discussed in the 
Programmer's Model chapter. 


There are several ways the icc bits can be modified. First of all, they can be written 
and read via the jump address bus by the instructions WRPSR and RDPSR. 


Certain arithmetic instructions modify the icc bits as a side effect. When one of 
these instructions is executing, the new icc values are generated in the E Block 
during the Execute stage, latched at the end of this stage, and loaded into the PSR 
during the Memory stage. 


Another path leads to the icc bits from the Writeback-stage copy of the PSR. When 
a trap occurs on an instruction which alters the icc bits, this path allows the pre- 
trap icc values to be restored to the PSR. 


The combinational logic which does the branch evaluation for the IU condition 
codes has as inputs: 


e Integer Condition Codes—Directly from the ALU, if the instruction in the 
Execute stage is one of those that can modify the icc; from the multiplication 
unit; or from the icc bits of the PSR, if the instruction in the Execute stage is not 
one that can modify the icc. 
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e The cond Field—From the branch instruction in the Execute stage. (See the 
discussion of the Bicc instruction in the Programmer’s Model chapter.) 


e Bicc Indicator—A control signal indicating whether or not the instruction in the 
Decode stage is a Bicc instruction. This signal remains valid into the Execute 
stage. 


The output of the combinational logic is a single signal which, when active, causes 
the branch target address to be loaded into the PC during the Execute stage; 
otherwise, PC+4 is loaded into the PC. 


Load Align Unit (LAU) and Store Align Unit (SAU) 


The LAU and SAU align data for loads and stores, respectively. Bytes and half- 
words to be loaded are right-justified in a 32-bit word, and either sign-extended 
or zero-extended on the left, depending on whether the load instruction specified 
signed or unsigned operation. The LAU performs the alignment and extension 
during Writeback. 


Byte and halfword stores take their data from the least significant byte or half- 
word of the register specified in the instruction’s rd field. The SAU performs the 
necessary alignment for writing the data to the byte or halfword memory address 
specified in the instruction. 


Multiply Unit 


The E Block contains hardware to perform integer multiplications. The Multiply 
Unit (MU) multiplies two 32-bit signed or unsigned integers to produce a 64-bit 
product. Some multiplication instructions modify the integer condition codes as a 
side effect; others do not. The multiplication instructions are discussed in the 
Programmer's Model chapter. 


The multiply hardware implements a version of Booth’s algorithm. Booth’s algo- 
rithm is similar to a “shift and add” multiply algorithm in that it scans the multi- 
plier from the least significant to the most significant bit and, based on the bit 
string encountered, iteratively adds the multiplicand to produce partial products. 
It is also similar in that the resulting partial product is right shifted to ready it for 
the following iteration of the algorithm. Booth’s algorithm differs from a “shift 
and add” algorithm in that it can also be used directly with a negative multiplier 
(whereas “shift and add” requires a positive multiplier). It differs also in that the 
hardware must provide for both addition and subtraction of the multiplicand. In 
particular, a 1-bit Booth’s algorithm examines two multiplier bits per iteration, 
looks for a bit transition, and either adds the multiplicand, subtracts the multipli- 
cand, or adds zero to the existing partial product to produce the new partial prod- 
uct. It “retires” one bit of the multiplier per iteration. For a 1-bit Booth’s, Table 3-3 
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shows the possible bit transitions encountered in the multiplier and the value 
which is added to the multiplicand for each transition. 


Add to Shifted Partial Product 


Booth’s Algorithm 


Multiplier Bits 
Gurrent [Previous 





















0 0 +0 
0 1 +multiplicand 
1 0 -multiplicand 
1 1 +0 


This technique can be extended so that more than one bit is examined during a 
given iteration. In particular, the MU performs an 8-bit Booth’s algorithm. It 
examines 9 bits of the multiplier at a time and, based on the eight transitions of 
these nine bits, determines what multiple of the multiplicand to add to the old 
partial product to produce the new partial product. The addition is performed in 
the ALSU. 


The MU produces 8 bits of the final product and “retires” 8 bits of the multiplier 
per cycle, and therefore requires only 5 cycles to do a 32x32 bit multiply (produc- 
ing a 64-bit result). 


The execution of the instruction is controlled by a synchronous state machine 
which generates control signals for the multiply hardware. Since instructions do 
not execute out of order, the Integer Unit (IU) must be frozen during the multiply 
instructions which take more than 1 cycle. Conceptually, the multiply instruction 
goes through all the pipeline stages (F,D,E,M,W), but its Execute stage is from 1 to 
5 machine cycles long. During the Fetch and Decode stages, the multiply instruc- 
tion progresses like other instruction. 


3.1.4 Programmer-Visible State and Processor State 


The SPARC Architecture defines the programmer-visible state of the processor as a 
collection of registers, and then specifies the effects of instructions in terms of 
these registers. These definitions implicitly assume that every instruction com- 
pletes before the next one begins. The SPARClite processor, however, is pipe- 
lined, so that normally four subsequent instructions begin before the first one 
completes. The actual processor state (excluding the register file) therefore encom- 
passes more than the programmer-visible state. For most of the programmer- 
visible registers, there is a corresponding register in the processor associated with 
the Writeback stage of the pipeline. That is, instructions normally update the reg- 
ister file and programmer-visible state registers in the Writeback stage. 
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An instruction may update staged copies of the PSR before Writeback, making the 
new values available to subsequent instructions sooner, but these staged copies 
are not user visible. The PSR associated with the Writeback stage can never be 
updated early; if an instruction traps, it will not have altered any state which can 
not be restored. 


3.1.5 IU Support for Debugging 


The IU supports the on-chip Debug Support Unit as well as external ICE circuitry 
and software with the following features: 


e A special breakpoint trap type instruction_breakpoint/data_breakpoint: This 
is a synchronous trap with trap type 255 and a priority less than the other 
synchronous traps, but greater than the software traps or interrupts. It is 
analogous to the instruction_access_exception and data_access exception traps, 
but has the following special characteristics: 


e Any instruction can cause a breakpoint exception (unlike the data_access_- 
exception, which can only occur for load/store instructions). 


¢ The trap vector for this taken trap is not the TBR concatenated with the trap 
type, but zero concatenated with the trap type. That is, the trap target 
address is 0x000003F0 regardless of the value in the TBR. 


Data and Instruction Caches 


The SPARClite architecture provides separate data and instruction caches, allow- 
ing designers to build high-performance systems without incurring the cost of 
fast external memory and its associated control logic. The software-visible fea- 
tures of the caches are discussed in detail in the Programmer’s Model chapter, 
above. 


The data and instruction caches are accessed independently over separate data 
and instruction buses, allowing data to be loaded from and stored to cache at 
peak rates of one cycle per instruction. The instruction cache is read-only, one 
word at a time. The data memory is readable and writable by bytes, halfwords, 
words or doublewords. 


In the MB86930 processor, each cache is 2 Kbytes in size, organized into two 
banks of sixty-four 16-byte lines. Cache lines are refilled in 4-byte increments to 
avoid the interrupt latency incurred by long, uninterruptible cache line replace- 
ments. In a unified (instruction and data) external memory, the instruction and 
data memory segments should be at aligned 4-word (line size) boundaries. 


The instruction cache has four major RAM arrays. There are two arrays for 
instruction memory and two arrays for tags. In addition to the tag memory, the 
tag arrays also contain the logic to compare the address tag with the address that 
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is being accessed. It also checks the VALID bits in the tag. The hit-detection logic 
is illustrated in Figure 3-8. 






ADR <31:2> ADR <9:4> TAG 





ADR <31:10> 


ASI <7:0> 


HIT 1 HIT 2 


Figure 3-8. Cache Hit Detection Logic 


The organization of the data cache is similar to the instruction cache. In addition, 
the data memory has individual write control for each byte. This makes it possible 
to do byte or half-word writes without using read-modify-write cycles. 


3.3 Bus Interface Unit 


The Bus Interface Unit (BIU) contains the logic which allows the processor to 
communicate with the system. The BIU receives requests for external memory 
and I/O accesses from the cache control logic. When the BIU performs a read, it 
returns the data to both the cache and the IU. Parallel paths make the data avail- 
able to the IU in the same cycle that it is written to the cache. The BIU also handles 
external requests for control of the bus. The external signals of the BIU, and the 
relative timing of events in typical bus operations, are discussed in the External 
Interface chapter, below. That chapter also treats the various system-support 
features of the processor in detail. 


3.3.1 Buffers 


The BIU has a one-word (32-bit) write buffer to hide external memory latency 
from the IU. When the BIU receives a request for a write transaction it stores the 
write data and address in the write buffer and indicates the completion of the 
write to the IU. It then proceeds to complete the write to external memory. This 
allows the IU to continue operation from the cache. The write buffer can be 
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enabled by setting bit 5 of the Cache/BIU Control Register, as discussed in the | 
Programmer’s Model chapter, above. The write buffer enable bit should be written 
to, only when the instruction and data caches are off. The write buffer works only 
when both instruction and data caches are on. 


The BIU also has a one-word prefetch buffer for instruction fetches. After an 
external instruction fetch, the prefetch buffer will initiate an access to the next 
sequential address, on the next available cycle. Instructions are prefetched only 
when the BIU does not have a request for a bus transaction from the IU, and no 
external device is requesting use of the bus. Prefetching is suspended if the buffer 
is full; this occurs if the prefetched instruction is a hit in the instruction cache or if 
the prefetched instruction is not used as in the case of a branch to a different 
address. The buffer restarts again after the next instruction cache miss. If an 
exception occurs during an instruction prefetch, the exception is not sent to the iU 
unless the instruction is actually requested by the IU. The prefetch buffer operates 
only when the instruction cache is on. 


3.3.2 Exception Handling 


The external memory system can indicate an exception during a memory opera- 
tion by asserting the -MEXC input. If -MEXC is asserted during an instruction 
fetch, the BIU indicates an instruction memory exception to the cache control 
logic and the IU. If -MEXC is asserted during a data fetch, the BIU indicates a 
data access exception to the cache control logic and the IU. 


As indicated above, the IU can continue to operate after putting the data and 
address for a store into the write buffer. If an exception is detected while complet- 
ing this buffered write then the BIU indicates a data access exception. Any system 
which wants to recover from this error should store the address and data for the 
write causing the exception, in a register. It should also have a status bit to indi- 
cate that the exception was caused during a write operation. It will be the respon- 
sibility of the data access exception service routine to determine the cause of the 
exception and recover accordingly. 


3.3.3 Effect on the Pipeline 


The pipeline hold signals, ihold and dhold, are generated if an instruction or data 
cannot be made available in the cycle that it is required by the pipeline. Normally 
ihold and dhold are not asserted if the required instruction or data is already in 
cache. On the other hand, if a cache miss occurs the cache controller requests that 
the appropriate data or instruction be fetched from the external system. Ona 
cache miss, the transaction will be available on the bus in the following clock cycle 
if nothing of higher priority is pending (see below). A bypass exists that allows an 
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instruction or data word to be made available in the same cycle that it is being 
written into cache. 


In general the following hierarchy rules apply to the bus interface unit: 
e the bus cycle currently in progress will complete 
e if the write buffer is full, the buffer will be emptied 





e if there is a pending request for a load or store operation it will be serviced 
e if there is a pending request for an instruction it will be fetched 
e if the prefetch buffer is empty, a prefetch cycle will be initiated 


This section illustrates the effect of bus operations on the instruction pipeline for 
some representative cases. 


Case 1: Cache Hits 


Figure 3-9 illustrates a sequence of hits in the instruction cache. The instruction 
fetched in cycle 0 is a STORE to location OxFO. The data is written to the Write 
Buffer in cycle 3, and to the bus in cycle 4. Since the write buffer is empty, the 
pipeline can move at a rate of one instruction per cycle, even when handling a 
STORE. LOAD instructions also do not hold up the pipeline, provided the source 
of the load is in the data cache. 


1 2 3 4 5 6 7 8 9 10 11 12 
Clock Cycle 


Address Lines XXXXXXXXXXXNXAAXANKAAAAN PA OHO | KXXXXKIKXXXXNXK KAKA 
Poe VY VV YY YY VY VY VVYVYI Dp oko NYYY YY YYYVVVVYVVY 
Data Lines XXXXXOXAAMAAAA AAA | DD oxFo KXXXXKXXXN 0N).0., 00,04 
re LS RES a ey Ree 

Ready Line 1 1 1 1 1 1 1 | 


Fetch 0x00 0x04 0x08 0x0C 0x10 0x14 


Decode 0x00 0x04 0x08 0x0C 0x10 0x14 Hy ste | 
Execute 0x00 0x04 0x08 0x0C 0x10 0x14 | 
Memory 0x00 0x04 0x08 0x0C 0x10 0x14 

Write-Back 0x00 0x04 0x08 0x0C 0x10 
Cache Status I hit | hit | hit | hit | hit 
Configuration: instruction Cache: ON Pre-Fetch Buffer: Enabled Memory Wait-State: 1 
Data Cache: — Write Buffer: Enabled 


Figure 3-9. Pipeline Operation: Cache Hits 
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Case 2: Prefetch Buffer Disabled 


Figure 3-10 illustrates the operation of the pipeline on instruction cache misses 
when the prefetch buffer is disabled. The address of each missed instruction is 
available on the processor external bus in the cycle following the miss. Since data 
| becomes available to both the IU and the cache on the same cycle, the pipeline can 
| proceed in the cycle immediately following the cycle in which the data appears on 
the external bus. 






Clock Cycle 






Address Lines KXXXXH 000 | —__ KXXXKL OO" RXXKKK Owe [NKR 
Data Lines KXRKTUKRRKKY 15:00 KKKKRRKKKKN O08 KYKXKKNKKKKN Oe || 


Ready Line 1 1 0 1 1 0 1 1 0 “s 


Fetch 0x00 0x00 0x00 0x04 0x04 0x04 0x08... 


Decode 0x00 0x00 0x00 0x04 0x04 0x04 
Execute 0x00 0x00 0x00 0x04 
Memory | 0x00 
Write-Back | 
Cache Status | miss stall stall | miss stall stall | miss stall stall 
| 
Configuration: Instruction Cache: ON Pre-Fetch Buffer: Disabled Memory Wait-State: 1 
Data Cache: — Write Buffer: - 


Figure 3-10. Pipeline Operation: Prefetch Buffer Disabled 
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Case 3: Prefetch Buffer Enabled 


Figure 3-11 illustrates the operation of the pipeline on instruction cache misses 
when the prefetch buffer is enabled. The address of the instruction missed on 
cycle 0 is available on the system bus in cycle 1. In cycle 3, the pre-fetch buffer 
logic drives the next sequential word address onto the address lines. The instruc- 
tion cache miss at this location therefore causes the pipeline to be stalled for only 
one cycle. Contrast this with Case 2, above. Since the prefetched instruction is 
actually used by the processor, the prefetch buffer drives the next sequential 
word address in cycle 5. This saves a cycle on each access when executing sequen- 
tial code not already in cache. 


ae, SORE REO Re ORS ee 


address Lines KXXXXY Roo | moor | Xow [| 
pata Lines KXXXXK Mi = 
SeaeeaAne ERAS arena 


Baie 
Ready Line 1 1 1 1 ous 


Fetch 0x00 0x00 0x00 0x04 0x04 0x08 0x08 


Decode 0x00 0x00 0x04 0x04 
Execute 0x00 0x00 0x04 
Memory 
Write-Back 
Cache Status | miss stall stall | miss stall | miss Stall | miss 
Configuration: Instruction Cache: ON Pre-Fetch Buffer: Enabled Memory Wait-State: 14 
Data Cache: —- Write Buffer: _ 


Figure 3-11. Pipeline Operation: Prefetch Buffer Enabled 
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Case 4: Data Cache Off 


Figure 3-12 illustrates the operation of the pipeline on loads, with the data cache 
turned off and the instruction cache turned on. The instruction fetched in cycle 0 
is a LOAD from memory location 0xF0. The data is fetched when this instruction 
reaches the Memory stage in cycle 7. Since the data cache is off, the data must be 
fetched externally; this delays the next instruction fetch until cycle 9. 


Whenever a prefetch operation is held up by a load or store operation, the pre- 
fetch buffer address gets updated if the instruction it is pointing to is a hit in the 
instruction cache. Therefore, when prefetch starts at cycle 9 the [AOx10 instruction 
address goes out on the address bus instead of 0x0c which has already hit in the 
cache. 


0 1 8 9 10 11 12 


ete POC UP UU UU 


Address Lines KXYY a Cd Cr CL Co no 
ieee eee a Sra 
pata Lines KXKRYXXKKKKH 80 KKKKKH Oe KRHA POON == pooeRO wore HNN Don 


Ready Line faa are Terabe 1 1 1 1 

















: | 
| | | 
Fetch 0x00 0x00 0x00 0x04 0x04 0x08 0x08 0x0C | 0x0C | 0x10 | 0x10 | 0x10 | 0x10 
i 
Decode 0x00 0x00 0x04 0x04 0x08 | 0x08 | 0x0C | 0x0C | 0x0C | 0x0C 
Execute 0x00 0x00 | 0x04. | 0x04 | 0x08 0x08 | 0x08 | 0x08 
Memory 0x00 | 0x00 0x04 | 0x04 | 0x04 0x04 
Write-Back : | | 0x00 | | 
| | | | | 
Cache Status | | miss stall stall | miss stall | miss stall ane ad stall | | miss stall | stall 
| 
Configuration: Instruction Cache: ON Pre-Fetch Buffer: Enabled Memory Wait-State: 1 
Data Cache: OFF Write Buffer: ~ 


Figure 3-12. Pipeline Operation: LOAD with Data Cache Turned Off 
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Case 5: Data Cache Miss 


Figure 3-13 illustrates the operation of the pipeline on loads, when the data access 
misses in the cache. The instruction fetched in cycle 0 isa LOAD from memory 
location OxFO. The data is required when this instruction reaches the Memory 
stage in cycle 7. The access misses in the cache, so the data must be fetched exter- 
nally. At cycle 7, the prefetch operation has already started so the external load 
operation is delayed until the prefetch completes. At cycle 9, the external load 
operation takes place. At cycle 11, the now empty prefetch buffer initiates the next 
sequential instruction fetch at address 0x10. 





8 9 10 11 12 


3 4 5 6 7 
a a eee eee eae ea 
Address Lines KXKXKY woo | wow | Yaows | Ymows | Yoaoes | Yoo 
ata Lines KXXXXN KAXXXXN _!9.ox00 KXXXXX| !Doxo4 DOXXKX X{ 1D.ox08 XX LA KX ID oxoc |X OX DD oxFO KXXXXXN 


Ready Line 1 1 1 1 1 1 / 


Fetch | 0x00 0x00 0x00 0x04 0x04 0x08 0x08 0x0C 0x0C 0x0C 0x0C 0x0C 


Decode 0x00 0x00 0x04 0x04 0x08 0x08 0x08 0x08 0x08 0x0C 
Execute 0x00 0x00 0x04 0x04 0x04 0x04 0x04 0x08 
Memory 0x00 0x00 0x00 0x00 0x00 0x04 
Write-Back 0x00 
Cache Status I miss stall stall | miss stall | miss stall /D miss Stall stall stall stall stall 
Configuration: Instruction Cache: ON Pre-Fetch Buffer: Enabled Memory Wait-State: 1 
Data Cache: ON Write Buffer: — 


Figure 3-13. Pipeline Operation: Data Cache Miss 
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External Interface 


The processor’s external interface consists of signals, bus operations, and system 
support functions. This chapter details the MB86930 signal set, gives the relative 
timing of events in the principal types of bus operation, and describes the pro- 
grammable wait-state generator, on-chip timer, and same-page detection logic. 
For specific electrical and timing values, see the MB86930 Data Sheet. The System 
Design Considerations chapter of this document discusses issues that are likely to 
arise in the design of any SPARClite system. 


4.1 Signals 


The processor’s external signals are illustrated in Figure 1-6 of the Overview chap- 
ter, and are listed in Table 4-1 below. A dash at the beginning of a signal name, as 
in -RESET, indicates that the signal is active-low. 
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Table 4-1: 





Input and Output Signals 


eek ae ee ee el ae 


ADR <31:2> —CSO, -CS1 -—LOCK 


-CS2.-CS3 
-CS4, -CS5 
| D <31:0> | _MEXC | _TIMER_OVF 
| EMU_BRK ~SAME_PAGE 
EMU D<3-0s RD/-WwR 


—-EMU_ENB ~READY | 
S(t) 


—~BREQ EMU_SD <3:0> VO || -RESET 
an Ae 


CLKOUT1 ~ERROR 
CLKOUT2 
‘ io aia) 




















XTAL1 (CLKIN) 
XTAL2 













NOTE: | = Input Only Pin A(L) = Asynchronous: Inputs may I(...) = While the bus is between bus 
O = Output Only Pin be asynchronous to cycles (or being reset) and is 
CLKOUT. not granted to another bus 


YO = Either Input or Output Pin 


- Ping “must be” connected Gl") = While the bus is granted to master, the pin is 
~ another bus master 1 (1) is driven to Voc 
as described (-BGRNT=asserted), the | (0) is driven to Vg 
S(L) = Synchronous: Inputs must pin is | (Z) floats 
meet setup and hold times ee | (Q) is a valid output 
relative to CLKIN Outputs a) paver Vee ms : 
are Synchronous to CLKIN G(Z) floats ss 


G(Q) is a valid output 
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The following sections describe the signal set in detail, arranged by functional 
group: 
e Processor Control and Status—Reset, error, and clock signals. 


¢ Memory Interface—Data and address buses, ASI and byte-enables, chip- 
selects, and other control signals used to access external memory and 
memory-mapped devices. 


e Bus Arbitration—Signals used by external devices in requesting, and by the 
processor in granting, control of the bus. 


e Peripheral Functions—Interrupt-requests and timer overflow. 





e Emulator Bus—Signals to support in-circuit emulation. 


e Boundary-Scan—Test signals used for board verification, following JTAG 
specifications. 


4.1.1 Processor Control and Status 


CLKOUT1 CLOCK OUTPUTS (0): MB86930 bus transactions can be referenced against 

CLKOUT2 these outputs. CLKOUT1 has the same frequency and phase as the internal 
oscillator, or the signal applied to CLKIN. CLKOUT2 is the same as 
CLKOUT1, but phase-shifted 180 degrees. 


ERROR SIGNAL (O): Asserted by the CPU to indicate that it has halted in an 
error state as a result of encountering a synchronous trap while traps are 
disabled. In this situation, the CPU saves the Trap Type (tt) value in the Trap 
Base Register, enters into an error state and asserts the -ERROR signal. The 
system can monitor the -ERROR pin and initiate a reset to recover from the 
error condition. 


SYSTEM RESET (I): Resets the processor to a known internal state. -RESET 
should be asserted for at least 4 processor cycles after the clock has 
stabilized. The internal state of the processor immediately after reset is 
described in the Programmer's Model! chapter. 


XTAL1 (CLKIN) EXTERNAL OSCILLATOR (XTAL1, XTAL2): Determines the execution rate 
XTAL2 and timing of the processor. Connecting a crystal across these pins forms a 
complete crystal oscillator circuit. The processor operating frequency is the 
same as the crystal oscillator frequency. 
The processor can also be driven by an external clock. In this case, the clock 
signal is applied to XTAL1 (CLKIN); XTAL2 should be left unconnected. The 
processor operating frequency is the same as the external clock frequency. 
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4.1.2 Memory Interface 


ADR[31:2] ADDRESS BUS (OQ): Specifies the data or instruction address of a 32-bit 


word. Reads are always one word in size while byte, half-word, or word 


transaction sizes for writes are identified by separate byte-enable signals 
ASI[7:0] 















(-BE3-0). The value on the address bus is valid for the duration of the bus 
transaction. 


ADDRESS STROBE (0): Asserted by the MB86930 or other bus master to 
indicate the start of a new bus transaction. A bus transaction begins with the 
assertion of -AS and ends with the assertion of -READY. During cycles in 
which neither the processor nor another bus master is driving the bus, the bus 
is idle, and -AS remains de-asserted. See Table 4-1 for signal values while 
the bus is idle. The MB86930 asserts —AS for 1 clock cycle. 


ADDRESS SPACE IDENTIFIERS (0): Indicates which of the 256 available 
address spaces the current bus transaction is accessing. The ASI values are 
defined as follows: 


ASI <7:0> ADDRESS SPACE 


Control Register 
Instruction Cache Lock 
Data Cache Lock 














| Application Definable 
| User Instruction Space 
Supervisor Instruction Space 


User Data Space 
Supervisor Data Space 
Instruction Cache Tag RAM 
Instruction Cache Data RAM 
Data Cache Tag RAM 
Data Cache Data RAM 

0x10 - OxFC | Application Definable 

OxFD - OxFF | Reserved for Debug Hardware 





The ASI values specified as “application definable” can be used by privileged 
(Supervisor mode) instructions such as load and store alternate. The ASI value 
is available in the same cycle in which the corresponding address value is 
asserted on the address bus. The values on the ASI pins are valid for the 
duration of the bus transaction. Transactions with ASI values of 0x8, 0x9, OxA, 
and OxB are cacheable. 


BYTE ENABLES (OQ): Indicate whether the current load or store transaction is 
a byte, half-word or word transaction. The BYTE ENABLE value is available in 
the same cycle in which the corresponding address value is asserted on the 
address bus. The values on the byte enable pins are valid for load and store 
operations and for the duration of the bus transaction (the byte enable signals 
can be ignored during load operations). 













—BE3-0 


Possible values for -BE3-0 are as follows: 
31 0 


Byte Wrtes(1 11 0]1 10 1[1017]07177 
Half-Word Writes 1100 001 1 
Word Writes 000 0 
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CHIP SELECTS (QO): One of these signals is asserted when the value on the 
address bus lies in the range specified by the corresponding Address Range 
Specifier Register. The —CS signals are used to decode the current address 
into one of eight address ranges. Address ranges should not overlap. Each 
address range has a corresponding wait-state specifier which is used to 
generate an internal -READY signal after a user-defined number of processor 
clock cycles. This allows a variety of memory and I/O devices with different 
access times to be connected to the MB86930 without the need for additional 
logic. CSO is enabled at reset (See Chapter 2). 


DATA BUS (I/O): D31 corresponds to the most significant bit of Byte 0. DO 
corresponds to the least significant bit of byte 3. A double word is aligned on 
an 8-byte boundary, a word is aligned on a 4-byte boundary, and a half-word 
is aligned on a 2-byte boundary. If a load or store of any of these quantities is 
not properly aligned, a mem_address_not_aligned Trap will occur in the 
processor. 


During write cycles, the point at which data is driven onto the bus depends on 
the type of the preceding cycle. If the preceding cycle was a write, data is 
driven in the cycle immediately following the cycle in which -READY was 
asserted. If the preceding cycle was a read, data is driven one cycle after the 
cycle in which -READY was asserted, in order to minimize bus contention 
between the processor and the system. 


BUS LOCK (OQ): Asserted by the processor to indicate that the current bus 
transaction requires more than one transfer on the bus. The Atomic Load 
Store instruction, for example, requires contiguous bus transactions and so 
causes the BUS LOCK signal to be asserted. The bus will not be granted to 
another bus master as long as —LOCK is active. -LOCK is asserted with the 
assertion of —AS and remains active until -READY is asserted at the end of 
the locked transaction 


MEMORY EXCEPTION (I): Asserted by the memory system to indicate a 
memory error on either a data or instruction access. Assertion of this signal 
initiates either a Data or Instruction Access Exception trap in the IU. The 
current bus access Is invalidated by asserting the -MEXC in the same cycle 
as the -READY signal. The IU ignores the value on the data bus in cycles 
where —MEXC is asserted. 


READ/WRITE BUS TRANSACTION (OQ): Specifies whether the current bus 
transaction is a read or a write operation. When —AS is asserted and RD/-WR 
is high, then the current transaction is a read. With -AS asserted and RD/-WR 
low, the current transaction is a write. RD/-WR remains active for the duration 
of the bus transaction and is de-asserted with the assertion of -READY. 
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READY (I): Asserted by the external memory system to indicate that the 
current bus transaction is being completed and that it is ready to start with the 
next bus transaction in the following cycle. In case of a fetch from memory, the 
processor will strobe the value on the data bus at the rising edge of CLKIN 
following the assertion of -READY. In the case of a write, the memory system 
will assert -READY when the appropriate access time has been met. 


In most cases, no external logic is required to generate the -READY signal. 
On-chip circuitry can be programmed to assert -READY internally, based on 
the address of the current transaction. The external system can override the 


internal ready generator to terminate the current bus cycle early. Up to 6 
address ranges each with different transaction times can be programmed. 
(See the System Support Functions section, below.) 


SAME-PAGE DETECT (OQ): Asserted when the address of the current 
memory access is within the same page as the previous memory access. 
—SAME_PAGE can be used to take advantage of fast consecutive accesses 
within page-mode DRAM page boundaries. -SAME_PAGE is asserted with 
—AS and remains active for one processor cycle. -SAME_PAGE is never 
asserted in the first transaction following a transaction by another device on 
the bus. The page size is specified by writing the Same-Page Mask Register. 
(See the System Support Functions section, below.) 





4.1.3 Bus Arbitration 
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BUS GRANT (0): Asserted by the CPU in response to a request froma 
device wanting ownership of the bus. The CPU grants the bus to other devices 
only after all transfers for the current transaction are completed. All bus drivers 
are three-stated with the assertion of the BUS GRANT signal. 


BUS REQUEST (I): Asserted by another device on the bus to indicate that it 
wants ownership of the bus. The request must be answered with a bus grant 
(-BGRNT) from the MB86930 before the device can proceed by driving the 
bus. Once the bus has been granted, the device has ownership of the bus until 
it de-asserts -BREQ. The user should ensure that devices on the bus do not 
monopolize the bus to the exclusion of the CPU. The assertion of -BREQ is 
recognized by the processor even when —RESET is being asserted. 
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4.1.4 Peripheral Functions 


IRL[3:0] INTERRUPT REQUEST BUS (I): The value on these pins defines the external 
interrupt level. IRL[3:0]=1111 forces a non-maskable interrupt. An IRL value of 
0000 indicates no pending interrupts. All other values indicate maskable 
interrupts as enabled in the Processor Interrupt Level field of the Processor 
Status Register (PSR). Interrupts should be latched and prioritized by external 
logic and should be held pending until acknowledged by the processor. An 
interrupt controller is available on the MB86940 peripheral chip. IRL inputs are 
sampled by the processor in cycle 1, synchronized in the following cycle, and 
recognized by the processor in the third cycle. 


TIMER OVERFLOW (0): Indicates that the processor’s internal 16-bit timer 
has overflowed. This signal can be used to initiate a DRAM refresh cycle ora 
one-cycle periodic waveform. On reset, the timer is turned off and 

—TIMER_OVF is high. 

























—TIMER_OVF 







4.1.5 Emulator Bus 


—EMU_BRK EMULATOR BREAK REQUEST LINE (1): Used to configure the debug unit 
on reset. See section 2.6. This pin should be left unconnected. 
EMU_D{3:0] EMULATOR DATA BITS (OQ): Reserved. These pins should be left 
unconnected. 





—EMU_ ENB EMULATOR ENABLE (I): Used to configure the debug unit on reset. See 
section 2.6. This pin should be left unconnected 

EMU_SDJ{3:0] EMULATOR STATUS/DATA BITS (I/O): Reserved. These pins should be left 
unconnected. 


4.1.6 Test and Boundary-Scan 


—CLK_ECB EXTERNAL CLOCK BYPASS (I): When tied high, causes the CLKIN signal to 
bypass the on-chip phase-locked loop. This signal is intended primarily for 
testing the chip. 


—~TRSTt TEST RESET (I): Asynchronous reset for JTAG logic. If not using JTAG, this 
signal must be pulled low. 


t. See appendix for more information 
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4.2 Bus Operation 


At any given time, the Bus Interface Unit is handling requests for external mem- 
ory and I/O operations, arbitrating for bus access, or idle. From the point of view 
of the external system, bus transactions are handled in fairly standard ways: 


e Memory and I/O Operations—Read and write transactions are initiated with 
the processor asserting the —AS signal. The RD/—WR output indicates the 
transaction type. The —BE[3:0] outputs indicate the transaction width. The 
processor drives the address and ASI signals, and either drives (on stores) or 
reads (on loads) the signals on the data bus. The transaction ends when 
—READY is asserted. 


An atomic load-store is executed as a load followed by a store, with no opera- 
tion allowed in between. The -LOCK output is asserted to indicate that the bus 
is being used for more than one consecutive memory operation. 


e Arbitration—Any external device can request ownership of the bus by 
asserting the -BREQ signal. The processor three-states its bus drivers and 
asserts -BGRNT to indicate that it is relinquishing control of the bus. On 
completion of its transaction, the external device de-asserts -BREQ; the 
processor responds by de-asserting -BGRNT in the following cycle. 


The BIU receives requests for external memory operations from the Cache Con- 
trol Logic. In the case of reads from external memory, it performs the read opera- 
tion and returns the data to the Cache and IU. A parallel path is used to make the 
data available to the IU in the same cycle that it is written to the cache. 


In the case of a write to external memory, the BIU makes use of a write buffer 
which can hold a one word write transaction. When the BIU receives a request for 
a write transaction, it stores the write data and address in the write buffer, allow- 
ing the IU to continue operating out of on-chip cache. The BIU then proceeds to 
complete the write to external memory. In most cases the write buffer will hide 
external memory latency from the IU. The exceptions are in cases where the write 
buffer is still filled from a previous transaction or if the subsequent IU cycle 
results in an instruction cache miss. In these cases, IU execution is held until the 
write buffer is emptied. The write buffer operates only when the instruction and 
data caches are both on. 


The BIU includes a one stage prefetch buffer for instruction fetches. This buffer is 
used to fetch the next sequential instruction after an instruction cache miss. The 
instruction is prefetched only if the BIU does not have a request for a bus transac- 
tion from the IU nor is any external device requesting use of the bus. The prefetch 
buffer operation is suspended if the buffer is full. This occurs if the prefetched 
instruction is a hit in the instruction cache or if a control transfer causes the 
sequential instruction to be skipped. The buffer restarts after another instruction 
cache miss. If an exception occurs during an instruction prefetch, the exception is 
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not sent to the IU unless the instruction is actually requested by the IU. The 
prefetch buffer operates only when the instruction cache is on. 


In any cycle the BIU can receive a request for accesses to either or both instruction 
and/or data memory. If it receives a request for both in the same cycle, it com- 
pletes the data memory transaction first. 


4.2.1 Exception Handling 


The external memory system can indicate an exception during a memory opera- 
tion. The BIU signals the appropriate data or instruction exception to the [U 
which will trap accordingly. 


As mentioned above, the IU can continue operation after putting the data and 
address for a store in the write buffer. If an exception is detected while complet- 
ing this buffered write, then the BIU indicates a data access exception to the IU. 


Any system which needs to recover from this error should store the address and 
data of such write transactions in hardware. If the system can generate both read 
and write exceptions, then the system must also provide a status bit which indi- 
cates whether the exception was generated on a read or on a write transaction. 
With access to this information the data access exception service routine can 
determine the cause of the exception and recover accordingly. 
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4.2.2 Bus Cycles 


This section presents the relative timing of events in representative bus transac- 
tions. 


| Load 


Whenever an instruction fetch or a load from data memory has a miss in the 
cache, the BIU performs a read from external memory. 


A read transaction begins with the BIU asserting —AS, to indicate a new bus trans- 
action. The —AS signal is de-asserted after one cycle. At the same time the 
ADR<31:2> and ASI<7:0> bits are driven with the location to be read. The BIU 
drives the RD/—WR signal high to indicate a read transaction. Note that the -BE 


lines indicate byte, halfword or word operations during load operations although 


their use is optional. The processor loads a word regardless of the size of data 
requested (byte, halfword, word). 





The external memory system responds with the read data on pins D<31:0>. It also 
| asserts the -READY signal when the data is ready (unless internal ready genera- 
| tion is selected). For slow memory, the -READY signal is delayed until data is 
| valid. | 


A load double operation is treated as back-to-back reads. 


LOAD 1 


LOAD 2 










ADR<31 :2> 

AS|<7:0> 

~BE<3-0> 
RD/-WR 
—READY 


D<31:0> D2 


Figure 4-1. Load Timing 
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Load with Exception | : 


If the external memory system sees a memory exception, it can terminate the cur- 
rent memory transaction by asserting the -MEXC and —READY signals. The data 
on the data bus is ignored by the MB86930. 






ADR<31:2> 
AS|<7:0> 
—BE<3-0> 


RD/-WR 


{ 
1 
! 
| 
| 
4 
4 
| 
| 
! 
—READY ee: ee re | 
| 
t 
' 
1 
—~MEXC pe !*S*=“‘“ COéS:*é=‘<airé 
1 
1 
1 
I 
t 
1 
t 


D<31:0> & 


Figure 4-2. Load with Exception Timing 
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Store 


A write transaction begins with the BIU asserting —AS, to indicate a new bus 
transaction. The —AS signal is de-asserted after one phase. At the same time the 
ADR<31:2> and ASI<7:0> pins are driven with the location to be written while 
the D<31:0> pins has corresponding write data. The -BE3-0 pins indicate byte, 
half-word or word transaction width. The BIU drives the RD/—WR signal low to 
indicate a write transaction. 


The external memory system responds by asserting the -READY signal when it 
has stored the data. There is always one idle bus cycle between the termination of 
a read cycle and the beginning of a write cycle to provide time for switching of the 
data bus drivers. 


A store double operation is treated as back-to-back writes. 


STORE 1 


STORE 2 


CLK_IN 


ADR<31:2> 
ASI|<7:0> 
—-BE<3-0> 







1 
i 
1 
I 
! 
I 
| 
! 
t 
I 
t 
I 
i] 
i 
i 
RD/-WR § 
I 3 
1 
! 
! 
| 
I 
! 
—READY : 
! 
! 
4 
I 


Figure 4-3. Store Timing 
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Store with Exception 


If an access exception occurs on a write, the external memory system can termi- 
nate the current memory transaction by asserting the -MEXC and —READY sig- 
nals. The external memory system is expected to ignore the data on the data bus 
in this situation. 









STORE 1 | 7 | 
| ! ! 
CLK_IN | a. Tre Coe ee. re | 
| | 
i} ! ! 
i) 
t I | ! 
—BE<3-0> : 
I ! : 
! | | 
a ' 
“AS 1 | “ap | 
' M 
i 
! Tee ea Pe ee ee 
RD/-WR |! : 
i ee 
I | | 
i ij 
1 
—READY ! | 
| i 
7 | 
H 1 
—MEXC ! 
1 
1 T 
| ! ! 
D<31:0> { . ae Up Wee ag 
wae 4 
1 , , 


Figure 4-4. Store with Exception Timing 
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Atomic Load Store 


An atomic load store executes as a load followed by a store with no operation 
allowed in between. The -LOCK signal is asserted to indicate that the bus is being 
used for more than one external memory operation. 


There is one cycle between the termination of the read and the beginning of the 
write to provide time for the switching of the data bus drivers. 


Idle Cycle 
LOAD 1 STORE 1 i 
CLK_IN / 


ADR<31:2> 
AS|<7:0> 
~BE<3-0> 
RD/-WR 
—READY 


-LOCK 


D<31:0> 





Figure 4-5. Atomic Load Store Timing 
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External Bus Request and Grant 


Any external device can request ownership of the bus by asserting the -BREQ sig- 
nal. The BIU asserts the -BGRNT signal to indicate that it is relinquishing control 
of the bus and also three-states all of its bus drivers. In the following cycle, the 
external device can complete its transaction. On completion of its transaction the 
external device de-asserts the -BREQ signal. The BIU responds by de-asserting 
the -BGRNT signal in the following cycle. 


The MB86930 is the default owner of the bus. 





Processor Bus Cycle n Complete ——p. Processor Bus Cycle n+1 Start ;—» 
{ ! 
\ 1 ' ( ' 


CLK_IN / 


-BREQ | 





-BGRNT | 


{ t 

1 
\ i} 
| 1 
| 1 
| 1 
i 1 
1 1 
| 1 
| i} 

— ALL BUS DRIVERS THREE-STATE 


Figure 4-6. External Bus Request and Grant Timing 
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Processor Reset 


The MB86930 is reset by asserting the — RESET signal for a minimum of 4 clock 
cycles (see Figure ). Systems using an external crystal to clock the processor 
should be sure that -RESET is asserted for at least 4 cycles after the crystal has 
started up and has stabilized. 


If the processor is reset following a halt in Error Mode, and if power to the proces- 
sor is not removed, the tt field after reset will contain the value of the Trap that 
caused the processor to halt. 


CLK_IN a 
1 


VODA 






RESET 


\_f 
| 

| 

4—4 CYCLE MINIMUM re 
t x | 
; J 
1 
{ 
! 


| 
yj 
I 
| 
! 
! 
{ 


| I 

@——- 3 CYCLES ‘aa 
1 t 
t I 


ADDR 





Figure 4-7. Reset Timing 


4.3 System Support Functions 


Built-in system support functions help to minimize the amount of glue logic 
required in the external system. The support includes programmable chip select 
logic, programmable wait-state generation, same-page detection logic and a timer 
for generating refresh requests. For a more detailed description of the program- 
ming of these registers refer to chapter 2. 


The System Support Control Register turns the various system support features 
on and off. 







Same-Page Enable (On=1, Off=0) 
Chip Select Enable (On=1, Off=0) 
Programmable Wait-State (On=1, Off=0) 
Timer On/Off (On=1, Off-0) 

Reserved 


Figure 4-8. System Support Control Register 


External Interface - System Support Functions 


4-16 


oO 
FUJITSU 


4.3.1 System-Configuration Registers 


The system-configuration registers (Address Range Specifiers, Address Masks, 
and Programmable Wait-State Specifiers) allow software to define six different 
address ranges. When an address driven by the processor is in one of these 
ranges, the corresponding Chip-Select (-CS) pin is asserted. After a number of 
clock cycles determined by the corresponding Programmable Wait-State Speci- 
fier, the processor automatically generates an internal -READY signal. This 
makes it possible for memory and I/O devices with different access times to be 
connected to the processor without additional logic. 


The contents of the Address Range Specifier Registers 1-5 (ARSR[5:0]) define five 
of the six address ranges. An additional address range is available, corresponding 
to —CS0. For this address range, ADR is hardwired to 0, and ASI is hardwired to 
Ox9 (Supervisor Instruction Space). With Mask Register AMRO, —CSO ranges 8K 
words. —CS0 is enabled at reset. -CS1, —CS2, —CS3, -CS4 and -—CS5 are disabled at 
reset. 


31 30 23 22 1 


0 
ie ASI <7:0> ADR <31:10> ¥ 


Figure 4-9. Address Range Specifier Register Format 


An Address Mask Register is associated with each address range. Any address 
driven by the chip is compared with the value in all address range specifiers. 
Only those bits of the register are compared for which the corresponding mask 
bits are 0. If the specified bits of the current address match one of the address 
range specifiers, the corresponding chip-select (-CS) pins are asserted. When no 
bus transaction is being performed, all the —CS pins are high (inactive). The 
Address Mask Register corresponding to —CS0 is initialized to compare all bits 
except ADR<14:10>. 


31 30 23 22 1 


0 
z ASI <7:0> ADR <31:10> Z 


Figure 4-10. Address Mask Register Format 
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A Programmable Wait-State Specifier is associated with each address range. 
Three registers are used to specify the wait states for the six address ranges. Each 
register contains the wait-state specifiers for two address ranges. 


When the address currently being driven by the processor matches the unmasked 
bits in one of the Address Range Specifiers, the corresponding wait-state specifier 
is selected. The format of Wait-State Specifier Registers is shown in Figure 4-11. 


27 26 22 21 20 19 18 14 13 9 8 7 6 5 





Wait Enable -On=1, Off=0) 
Single Cycle (On=1, Off=0) 
Override (On=1, Off-0) 


Figure 4-11. Wait-State Specifier Register Format 


If the Single Cycle bit equals 1, an internal -READY signal is generated in the 
same cycle. If the Single Cycle bit equals 0, and the current transaction is in the 
same page as the previous transaction (see the Same-Page Detection Logic section, 
below), then Count2 + 1 is used as the number of cycles after which -READY is 
asserted internally. If the transaction is not in the same page, Count! +1 is used 
instead. If the Wait Enable bit equals 0, the internal -READY is not asserted. 


The Override bit allows the user to terminate a transaction earlier than the speci- 
fied time. If this bit equals 1, and external hardware asserts the external -READY 
signal, then the wait-state generator will stop counting and will wait for the next 
transaction, which can occur as soon as the next clock cycle. 


The Count1 and Countz2 fields of the Wait-State Specifier corresponding to —CSO 
have all their bits set to 1 on reset. In this way, 32 wait-state cycles (the maximum 
number) are inserted into the processor’s first instruction accesses. The override 
bit for —CSO is enabled as well. 


4.3.2 Same-Page Detection 


The same-page detection logic determines whether the address of the current 
memory transaction is on the same page as the previous transaction. If it is, the 
processor asserts the -SAME_PAGE signal. The system can then take advantage 
of the fast consecutive accesses possible within fast-page mode DRAM page 
boundaries. The same-page detection logic consists of a mask register, a register 
to store the address and ASI bits of the previous transaction, and a comparator. 
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The Same-Page Mask Register specifies which bits of the current address and ASI 
must be compared with the previous address and ASI. Only those bits are com- 
pared for which the mask bit is 1. 


31 30 23 22 1 0 


ASI Mask Address Mask (ADR [31:10]) 
(Card=0, Don't Care=1) (Card=0, Don’t Care=1) 





Figure 4-12. Same-Page Mask Register 


The -SAME_PAGE signal is never asserted for the first transaction following a 
transaction by another device on the bus. When using the internal wait-state gen- 
erator, DRAM control logic should issue a bus request when initiating a refresh 
cycle so that the -SAME_PAGE logic is reset appropriately. The SAME PAGE 
feature is disabled at reset. 


4.3.3 Programmable Timer 


The 16-bit programmable timer causes the -TIMER_OVF output signal to be 
asserted at software-defined intervals. This signal can be used to initiate DRAM 
refresh cycles, or to control other periodic events in the external system. 


The current timer count is kept in the Timer Register. When the timer overflows, 
it is loaded with the value in the Timer Preload Register. The contents of both of 
these registers are undefined on reset. 


31 16 15 0 
31 16 15 0 


Figure 4-13. Timer and Timer Preload Registers 


The timer can also be loaded by writing directly to the Timer Register. The timer 
can be turned off by writing a 0 to the Timer On/Off bit in the System Support 
Control register. The timer is clocked at the processor clock frequency. 
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Programming Considerations 


This chapter gives programmers information and advice about how to make the 
best use of SPARClite processors. It discusses the initialization of a SPARClite 
system, the design of trap handlers, window management, the use of on-chip 
cache, and SPARClite-specific instructions. 


Because of the availability of high-performance optimizing compilers, real-time 
operating systems, target monitors and application software, many programmers 
will never need to program at the detail described in this chapter. However, for 
those writing their own kernels or operating systems, and for those wanting to 
hand optimize compiler code, sections in this chapter will prove useful. 


Most of the sections in this chapter contain code fragments illustrating the points 
under discussion. In some sections, complete subroutines are provided which can 
be used without modification in real systems; the integer multiplication and 
division routines are a good example. 


To follow the discussion and examples in this chapter, you should be familiar 
with the contents of Chapter 2, Programmer’s Model. You should also know how to 
read SPARC assembly language (see Chapter 7). 


5.1 Initialization 


Processor reset occurs when the external system asserts the -RESET input. Upon 
reset, the processor is in supervisor mode. It begins fetching and executing 
instructions starting at address 0x00000000 in Supervisor Instruction Space (ASI 
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0x9). The S bit of the PSR is set to 1; the ET bit is cleared to 0. The tt field of the 
Trap Base Register remains unchanged and identifies the last trap encountered if 
reset occurs without removing power from the processor. This provides a way to 
trace the origin of a halt to error mode (on power-up, the tt field is undefined). All 
other fields of the SPARC control and status registers (PSR, WIM, TBR, and Y) are 
undefined on reset. 


The Cache/BIU Control Register and System-Support Register are cleared to 0; 
that is, the various features controlled by these registers are turned off (except for 
~—CS0). The contents of the on-chip cache and the various system-configuration 
registers are undefined (see Chapter 2 for details). 


5.1.1 Establishing the Processor State 


The first task of initialization code is to establish the processor state, as in the 
following code fragment: 


' Reset Initialization 


wr %90, Ox0Ofal/,tpsr ! Set psr: mask interrupts, mode=S, Pmode=U, 


1 
! traps enabled, CWP=7 

wr %g0, Ox0, Swim ! Initialize wim to window 0 

wr %90, Ox0, %tbr l Initvalize tbr toe. 0 


Writes to the PSR, WIM, and TBR registers are delayed by three instruction cycles; 
that is, the value in the register undefined for three instructions following the 
write. Accessing one of these registers, either explicitly or implicitly, within three 
instructions after a write can lead to unpredictable results. 


5.1.2 Configuring the System 


Initialization code must also configure the system by writing appropriate values 
into the system-configuration registers (Address Range Specifiers and Masks, 
Wait-State Specifiers, Same-Page Mask, and the Timer Registers). Figure 5-1 
shows the memory map of a simple example system. 


tt 
0x20000000 


/  -CS1 Subsystem 


0x10000000 





0x00000000 


Figure 5-1. Example System Memory Map 
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The following code sets the various system-configuration registers to values 
appropriate for the example system. 


! Address Range Register and Address Mask Register for -CSO and 

' -CS1 are set here. Only the highest nibble of the addresses 

! are used for mapping the different -CS signals as shown in Figure 5-1. 
! Note: Address range register for -CSO is preset to 0x04 80 00 OO 

! AST=0x9, addr<31:10>=0x0 





sethi thi (Oxfdf<<19), %10 

xnor %g0, %10, %10 ! Set address mask register for -CS0 
or 6g0, Ox140, S11 ! AST<1>=x, addr<27:0>=OxXXXXXXX 

sta SLO eka ' SI and SD ASI, addr=Ox0OXXXXXXxX 
sethi thi (Oxb1<<19), %10 ! Set address range register for -CS1: 
or eguU, Osl24, oii ! AST=Oxb, addr<31:28>=0x1 

sta SLOg: reba pi cL. 

sethi Shi(Oxfc£<<19), %10 

enor “sq0;-oh0;, 210 ! Set address mask register for -CS1 
or 6g0, Ox144, S11 ! ASI<1,0>=xx, addr<27:0>=O0xXXXXXXX 
sta eG Proud) ak ! SI, SD, UI and UD ASI,addr=0x1XXXXXXxX 


! Set Wait State Specifier Registers 

! Note: count=WS-1, WStl=cycles, count=cycles-2 

! Wait state value is for -CSO (ROM) and is set to: 

! count=6, wait en=1, single cyc=0, override=0 

! Wait state value is for -CS1l (subsystem) and is set to: 
! count=0, wait en=0, single cyc=0, override=0 


or 290; OX160, oli ' -CSO and -CS1 WSS Register 
or 6g0, 0x634, S10 
sil $10, 6, %10 
sta OO rome [eel OP 
-align 4 
-word 0Qxa3802001 ! Set Ancillary Register 17 bit 0 
! to enable single vector trapping. 
' Machine code is used here for assemblers 
! which do not have the WR ASR intruction. 
or $g0, 0, %10 ! Write 0 into Cache/BIU Control Reg 
sta $10, [sgOQ] 1 ! disabling all caches 
set OxELL i, S10 ! Set Timer Pre-Load Register 
or g0, Ox174, S11 ' Reload value is set to Oxffff 


sta SLU, (elt) 


set Ox7£600006,. SiL ! Set Same-Page Mask Register 
or BOO%, OxXLZ0, 10 ! Page size is set to 1K for any ASI 


sta LO. [okl} 1 
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or %g0, Ox3C; 
or %g0, 0x80, 
sta S10, [SL] ] 


5.1.3 Initializing 


S10 ! Set System Support Control Reg: 
S11 ! -SAME PAGE, -CS<5-1>, WS generator and 
1 ) = TIMER OVE are ali enabled 


the On-Chip Cache 


On reset, both caches are turned off, and all memory requests are sent to the Bus 
Interface Unit. In order to use the caches, software must initialize the Valid, Least 
Recently Used and Entry Lock bits by writing 0’s to the appropriate alternate 
address spaces. After initializing the cache, a program can write 1’s to the Cache 
Enable bits of the Cache/BIU control register to turn the caches on. The prefetch 


and write buffers of the 


The following code initializes the data and instruction caches, then e 


BIU can be turned on in the same operation. 


ables cach- 


ing and BIU buffering. 
#define set_size 64 
#define ini_tag 0 
#define adrl 0 
#define adr2 0x80000000 
#define CTL BITS 0x35 /*turn on i-cache, d-cache, prefetch buf., write buf.*/ 
#define icache lock _bit Oxl 
#define dcache lock bit 0x3 
#define icache_lock 0x8 
#define dcache_ lock Oxa 
#define icache_enlock Ox 
#define dcache enlock 0x2 
#define lock _reg adr 0x4 
#define lock _save_adr 0x8 
.seg “text” 
set set_size, %17 /* RAM size */ 
set adrl, %00 /* start address, set 1 */ 
set adr2, %o02 /* start address, set 2 */ 
set ini tag, %10 /* initial tag value */ 
loopinit: 
sta $10, [%00] Oxe ! write set 1, itag 
sta $10, [%00] Oxe ! write set 1, dtag 
sta $10, [%02] Oxc ! write set 2, itag 
sta $10, [%02] Oxe ! write set 2, dtag 
add $00, 16, %o00 ! inc by 4 words (each tag serves 4 words) 
subcc %17, 1, %17 
bne loopinit 
add $02, 16, %02 ! delay slot 
set Oye eat 
set CTL, BITS; 317 I turn on caches, 
sta $17, [S1l1]1 
nop ! some nop’s for transition 
nop 
hop: ~ 
nop 
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5.2 Trap Handling 


An interrupt or trap (other than reset) causes a vectored transfer of control into a 
trap table. The first four instructions of each trap handler are in the trap table 
itself. The Trap Base Address field in the Trap Base Register contains the base 
address of the table. Associated with each trap type is an 8-bit value, which (left 
shifted by 4 bits) is used as an offset into the table. From the trap table, control 
typically passes (via a JMPL or BA instruction) to the appropriate trap handler. A 
trap table with base address 0x00000000 is shown in the following code fragment. 


Note that since —-CS0 is selected for address range 0x0-0x3fff, the branch after reset 
at address 0x0 must vector within this address range if the internally generated 
chip select is being used. There is sufficient space after the trap handler (at label 
“start” below) yet still within the CSO default range to write the CSO mask register 
if required. 





0 f (Vesec: mov OxeO, %Spsr 
4 mov 6g0, Stbr 
/* 


/* Q —> TBR assumes boot is from fast memory, and that only the 
/* first 4 instructions of the response to reset are there. Single 
/* Vector Trapping is to remain disabled. 


* / 

8 ba start 

s: mov 6gQ, Swim 
LQ) “EUINst) -access-exception? cae) etbr,- 13 
14 rd spsr, 10 
18 ba jae handler 
ie nop 

20. T_unimplemented instruction: rd SCD, “SiS 
24 ra: Spsr, 10 
28 ba illegal 

2c nop 

30° TP oprivileged instruction: . ee Stbr, S13 
34 ro. Spsr, %10 
38 ba privileged 
3c nop 

40. T fp-disabled: jae! ScLDE S13 
44 eC. epsr, %10 
48 ba fp disabled 
4c nop 

50 T window overflow: rd Stbry,. B13 
54 rd spsr, S10 
58 ba win overflow 
5c nop 
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60 
64 
68 
6c 
70 
74 
78 
LC 
80 
84 
88 
8c 
90 
94 
98 
9c 
ad 
a4 
a8 
ac 


bO 
b4 
b8 
be 
c0 
c4 
c8 
ec 


100 
104 
108 
10c 
LAO 
114 
118 
Lie 
120 
124 
123 
IZ¢ 


1£0 
1f4 
L£S 
Lee 


T window_underflow: 


T mem_addr not _aligned: 


LT: Ep. exception: 


T data_access exception: 


f tag overflow: 


DJG 1% 


sae 96 cae 


2 ge at cme Oo 
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rd 
i) 
ba 
nop 
rd 
rd 
ba 
nop 
rd 
rd 
ba 
nop 


nop 


rd 
rd 
ba 
nop 
rd 
rd 
ba 
nop 


rd 
rd 


nop 
rd 
rd 
ba 
nop 
rd 
rd 
ba 
nop 


rd 
rd 
ba 
nop 


tbr, 313 
Spsr, S10 
win _underflow 


StbDr,. S13 
Spsr, %10 
misaligned addr 


Stbr, %13 
spsr, S10 
unimplemented trap 


Stbr, S13 
Spsr, %10 
dae_handler 


Stbr, 13 
Spsr, S10 
tag overflow 


Stbr, 13 
Sspsr, %10 
unimplemented trap 


Stbr, %13 
Spsr, %10 
unimplemented_ trap 


stbr, %13 
Spsr, %10 
unimplemented trap 


Stbr, %13 
Spsxr, %10 
int handler 


Stbr, %13 
Spsr, S10 
int_handler 


Stbr, S13 
Spsxr, %10 
rt handler 


200 
204 
208 
Z20C 
ZA® 
214 
218 
Z2Le 
220 
224 
225 
Zac 
230 
234 
Zoo 
A256 
240 
244 
248 
24c 
250 
254 
258 
Z5C 
260 
264 
268 
26¢ 
270 
274 
Zt 
Z16 
280 
284 
288 
266 
290 
294 
298 
29c 
2a0 
2a4 
2a8 
2ac 
2b0 
2b4 
2b8 
ZC 


tie Perr: 


T aecrr: 


I cp. dusabled: 


tT UCp ECSCepciON: 


E-daerr: 


rd 
rd 
ba 
nop 
ro 
rd 


nop 
nate | 
rd 
ba 
nop 
ra 
rd 


nop 
ra 
rd 


nop 
rd 
rd 


nop 
rd 
rd 


nop 
ra 
sare 
ba 
nop 
rd 
ra. 
ba 
nop 
rd 
rd 
ba 
nop 
rd 
rd 
ba 
nop 
ro. 
rd 
ba 
nop 


Sur. elo 
Spsr, %10 
unimplemented trap 


SCO, -oh3 
spsr, %10 
iae handler 


see, clo 
Sper, 610 
unimplemented trap 


SLL, os 
spsr, S10 
unimplemented trap 


Scbr, si3 
spsr, S10 
cp. disabled 


StLOL, Gis 
epSsr;, 310 
unimplemented trap 


eer; ok 
Spsr, S10 
unimplemented trap 


Stbr, S13 
Spsr, S10 
unimplemented trap 


ecbr,: ol3 
Spsr, S10 
unimplemented trap 


stbr, 13 
Spsr, 10 
dae_ handler 


story: ol3 
Spsr, %10 
unimplemented trap 


Stbr, %13 


Spsr, %10 
unimplemented trap 
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800 
804 
808 
80c 
810 
814 
818 
81lc 


fed 
fe4 
fesg 
fec 
f£f0 
ff4 
ETS 
LEC 


100 


software traps: rd Stbr, %13 
ra Spsr, S10 
ba trap instr 
nop 
rd stbr, %13 
rd %psr, %10 
ba trap instr 
nop 
rd Stbr, %13 
rd Spsr, %10 
ba trap anstr 
nop 
ra so ao 

d Spsr, S10 
ba emu_exception 
nop 
Ostart: 


When a trap is taken, the processor writes the trap type number into the tt field of 


the 


Trap Base Register, and disables traps by clearing the ET bit of the Processor 


Status Register. The processor enters supervisor mode (S=1), saving the old state 
of the S bit in the PS field of the PSR. The Current Window Pointer is 
automatically decremented. 


Each of the illustrated trap handlers (except for reset) begins by saving the values 
of the TBR and PSR, and then jumps, by means of an unconditional branch, to the 
next instruction in the service routine. 


Each trap handler must then: 


1. 


Ensure that a window is available, in case another trap occurs. (When it takes a 
trap, the processor automatically saves the window of the interrupted routine 
by decrementing the Current Window Pointer.) 


Re-enable traps by setting the ET bit of the PSR. 
Handle the exceptional condition that caused the trap. 


Ensure that a window is available, so that the RETT (return from trap) 
instruction can restore the window of the interrupted routine by incrementing 
the CWP. 


Disable traps by clearing the ET bit of the PSR. 


Execute a JMPL/RETT instruction pair. The address for the return is found in 
r[17] (When it takes a trap, the processor loads r[17] with the value in the PC). 
The RETT instruction automatically re-enables traps. 
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To re-execute the trapped instruction when returning from a trap handler use the 
sequence: 


JMPL %17, %0 ! old PC 
rett 18 ' old nPC 


To return to the instruction after the trapped instruction (e.g., when emulating an 
instruction) use the sequence: 


jmpl 318, %0 bOssd> ne C 
rett 18 + 4 ! old nPC + 4 


Two example trap handlers are shown below. 


_ferum Trap Instr $ 


andn %10, QOx20, %10 ! Disable traps. 

wr oLO, spsr 

or 2q0, Oxl, sgl ! Set Restore Lock bit, 

Or 6g0, Ox10, %10 ' in case an autolock seguence 
sta gi, [S10] 1 ' ais in effect. 

jmpl G11, %g0 ! Return to instruction at pc. 
rett %12 


!' Return routine for skipping the trapped instruction. 
' 


oki pe crapInsers 


andn %10, 0x20, %10 ! Disable traps. 

wr 610, Spsr 

or eu, scl) seq ! Set Restore Lock bit, 

or sg0, UxLO, 610 ! ain case an autolock sequence 
sta oGkg We LOd, Lb ' ais in effect. 

JiMpl- “el2y «ge f Return to instruction at npc. 


rett %12+4 
LENS access: 
Jiilegal. 1nser: 
JPrivil tse: 
_fp disable: 
! FUNCTION 
_win_ovf 
! DESCRIPTION 
! This routine is the trap handler for register window overflow trap. 
: Praoritcys 0x06 


! Upon entry, the cwp points to the trap window, which is 1 less than 
! the register window that must be saved to the stack. the stack is 
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! organized with %16 = %o06 - (0x40 + local stack used). the ins and 
! locals are saved, and the wim is adjusted for the new window. 


! INPUTS 
! - None. 


! INTERNAL DESCRIPTION 

! - Move the invalid window to the next window by rotating the %wim 

! register left by one slot. 

! - Get into the previously invalid window, the one that caused the trap, 
! and save all of the registers in it. 

! - Get back into the previously valid window and let the trapped routine 
! execute the save again. 


! RETURNS 
! - $00 = 1 so execution starts at the trapped instruction. 


win overflow: 


or S10, OUNZO0> 60 ! enable traps 
wr $10, tpsr 
rd Swim, S14 , ! Get wim at trap time. 
mov ool, “eld Y ave. og. 
srl eLay Ly -egl ! Next WIM = %gl = 
! rol(WIM, 1, NWINDOW). 
sll 614, NWINDOWS-1, %15 
OF oLoy. ogy. sol 
save ! Get into window to be saved. 
wr 6g1,%g0, Swim ! Install new wim. 
nop ! must delay three instructions 
nop ! before using these registers, so 
nop ! put nops in just to be safe 
st $10, [%sp + 0x0 * 4] ! save all local and “in” registers 
st $11, [%sp + Oxl * 4] 
st $i2, [%sp + 0x2 * 4] 
st $13, [%sp + 0x3 * 4] 
st $14, [%sp + 0x4 * 4] 
st $15, [%sp + 0x5 * 4] 
st $16, [ssp + 0x6 * 4] 
st Si7, [%sp + Ox7 * 4] 
st $10, [%sp + 0x8 * 4] 
st $11, [%sp + 0x9 * 4] 
st $12, [%sp + Oxa * 4] 
st $13, [%sp + Oxb * 4] 
st $14, [sp + Oxc * 4] 
st Bop Loos: a Os aj 
st S16, [Ssp + Oxe * 4] 
St $17, [sp + Oxf * 4] 
restore ! Go back to trap window. 
mov Bip SOL ! Restore %gl. 


rerun Crap 2nstrs 
andn $10, 0x20, %10 ! Disable traps. 
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wr 610, Spsr 
or 90; Ox1l, sgl ! Set Restore Lock bit, 
Or 6g0, Ox10, %10 ! ain case an autolock sequence 
sta sgl, [el0]- 1 ! ais in effect. 
mpl 611. 6gG0 ! Return to instruction at pc. 
rett S12 

! 

! FUNCTION 

_win_unf 

! 

! DESCRIPTION 


! This routine is the trap handler for register window underflow trap. 

: Priority: 0x07 

: Upon entry, the cwp points to the trap window, which is 1 more than 
! the register window that must be restored from the stack. The stack 
! is organized with %16 = %$06 - (0x40 + local stack used). The ins 

! and locals are restored, and the wim is adjusted for the new window. 





! INPUTS 
! - None. 


! INTERNAL DESCRIPTION 


! RETURNS 
! - $00 = 1 so execution starts at the trapped instruction. 


win_underflow: 


or oLOy O20, SLO ! enable traps 

wr 610, %psr 

mov Swim, %14 ! Get wim. 

sll $14, 1, %15 ! Next WIM = rol(WIM, 1, NWINDOW). 
srl $14, NWINDOWS-1, %16 

or $16, %15, %16 

mov S16, swim ! Install. 1t. 

nop ! must delay three instructions 
nop ! before using these registers, so 
nop ! put nops in just to be safe 
restore ! Back to user window. 

restore ! Get into window to be restored. 
ld [ssp + 0x0 * 4], S10 ! restore all registers 

ld [ssp + Oxl. * 4), 3211 

dd [Sep MOE. 4g ea 

sa [ssp + 0x3 * 4], %13 

AK | [Ssp + 0x4 * 4], S14 

ld [ssp>+ Oxo: © A), 625 

ld [3sp + 0x6 * 4], 226 

ld [ssp + Oxi * 4] 5 S27 

La [ssp + 0x8 * 4], %10 

tive! [tsp + 0x9 * 4], 11 

ld [ssp: + Oxar * 4) %- 612 
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Eis | [ssp + Oxb * 4], S13 
ld [ssp + Oxc * 4], S14 
ld [ssp + Oxd * 4], %15 
ld [ssp + Oxe * 4], %16 
ld [ssp + Oxf * 4], %17 
save ! Get back to original window. 
save 
rerun: trap. instr: 
andn $10, 0x20, %10 ! Disable traps. 
wr $10, spsr 
Or aq0, Oxl,. sgl ! Set Restore Lock bit, 
or %g0, 0x10, %10 ! in case an autolock sequence 
sta SQL 7 LoeL0) 2 ! ais in effect. 
jJmpl S11, %g0 ! Return to instruction at PC. 
rett S12 


5.3 Register and Stack Management 


This section describes the standard conventions for using the register file. Most 
SPARC compilers comply with this convention as this is the standard adopted on 
SPARC workstations. (Compilers are available that optimize code differently for 
embedded applications if required.) 


This section describes standard conventions for using the register file. 


5.3.1 Registers 


Register usage is typically a critical resource allocation issue for compilers. The 
SPARClite architecture provides windowed integer registers (in, out, local), and 
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global integer registers. Figure 5-2 summarizes the SPARC register set, as seen by 
a user-mode procedure. 





in sa a 
vis (29 
vis (028 
“sa (ue 
vio (25 
vit (45 
vio (2 
jocal——=SCTC 
28 (we 
vas (20 
xis (rt 
aio (rt 
vt arin 
20 (rt 
out “a? Oar 
v8 (sta 
oS (ero 
wot rt 
vod er 
woe (0 
‘at Ga 
“od (8 
global %g7 (%r7) global 7 (SPARC ABI: use reserved) 
= is 
‘hu 
vat (8 
vast 
%g2 (%r2) global 2 (SPARC ABI: global register variable) 
es 
mo ey | OOOC—C—OOCOCOSCOCOCS 
state Sy (%r30) Y register (used in multiptication/division)* 
(icc field of %psr) Integer condition codest 


t. assumed by caller to be preserved across a procedure call 
+. assumed by caller to be destroyed (volatile) across a procedure call. 


Figure 5-2. SPARC Register Set, as Seen by a User-Mode Procedure 


In and Out Registers 


The in and out registers are used primarily for passing parameters to subroutines 
and receiving results from them, and for keeping track of the memory stack. 


Programming Considerations - Register and Stack Management 


5-13 








SPARClite User’s Guide 


Certain routines can also use out registers 0 through 5 as fast temporary storage; 
these include leaf routines—which contain no procedure calls—and routines 
which pass parameters using only shared memory or global registers. In general, 
when a procedure is called, the caller’s outs become the callee’s ins. 


One of a procedure’s out registers (%06) is used as its stack pointer, %sp. It points 
to an area in which the system can store %rl6 ... %r31 (%10 ... %i7) when the 
register file overflows (window_overflow trap); it is used to address most values 
located on the stack. See Figure 5-3. A trap can occur at any time, which may 
precipitate a subsequent window_overflow trap, during which the contents of the 
user’s register window at the time of the original trap are spilled to the memory to 
which its %sp points. 


A procedure may store temporary values in its out registers, with the exception of 
%sp, with the understanding that those values are volatile across procedure calls. 
% sp cannot be used for temporary values for the reasons described in the Register 
Windows and %sp section below. 


Up to six parameters can be passed by placing them in out registers %00...%05; 
additional parameters are passed in the memory stack. The stack pointer is 
implicitly passed in %06, and a CALL instruction places its own address in %07. 


When an argument is a data aggregate being passed by value, the caller first 
makes a temporary copy of the data aggregate in its stack frame, then passes a 
pointer to the copy in the argument out register (or on the stack, if it is the 7th or 
later argument). 


After a callee is entered and its SAVE instruction has been executed, the caller’s 
out registers are accessible as the callee’s in registers. 


The caller’s stack pointer %sp (%06) automatically becomes the current 
procedure’s frame pointer %fp (%i6) when the SAVE instruction is executed. 


The callee finds its first six parameters in %i0 ... %i5, and the remainder (if any) 
on the stack. 


For each passed-by-value data aggregate, the callee finds a pointer to a copy of 
the aggregate in its argument list. The compiler must arrange for an extra derefer- 
encing operation each time such an argument is referenced in the callee. The addi- 
tional code in the callee program uses the pointer to access aggregate values on 
the stack. 


If the callee is passed fewer than six parameters, it may store temporary values in 
the unused in registers. 


If a register parameter (in %i0 ... %i5) has its address taken in the called proce- 
dure, the callee stores that parameter’s value on the memory stack. The parameter 
is then accessed in that memory location for the lifetime of the pointer(s) which 
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contains its address (or for the lifetime of the procedure, if the compiler doesn’t 
know the pointer’s lifetime). 


The six words available on the stack for saving the first six parameters are deliber- 
ately contiguous in memory with those in which additional parameters may be 
passed. This supports constructs such as C’s varargs, for which the callee copies to 
the stack the register parameters which must be addressable. 


A function returns a scalar integer value by writing it into its ins (which are the 
caller’s outs), starting with %10. Aggregate values are returned using the mecha- 
nism described in the Functions Returning Aggregate Values section. 


A procedure’s return address, normally the address of the instruction just after 
the CALL’s delay-slot instruction, is simply calculated as %i7 + 8. 


Local Registers 


The locals are used for automatic variables—those whose lifetimes are no longer 
than the lifetimes of their containing procedures—and for most temporary values. 
For access efficiency, a compiler may also copy parameters (i.e., those past the 
sixth) from the memory stack into the locals and use them from there. Procedures 
only calling several leaf routines may be more efficient if some of the procedure's 
automatic variables are referenced by their address rather than have the values 
passed for each leaf routine call and return. If an automatic variable’s address is 
taken, the variable’s value must be stored in the memory stack, and be accessed 
there for the lifetime of the pointer(s) which contains its address (or for the life- 
time of the procedure, if the compiler doesn’t know the pointer’s lifetime). 


If a routine creates variables that can be used by other called routines, these vari- 
ables should either be stored in the memory stack and referenced by pointers, or 
stored in the global registers, unless the register window does not change when 
the other routines are called. 


Register Windows and %sp 


Some caveats about the use of %sp and the SAVE and RESTORE instructions are 
appropriate. It is essential that: 


e %sp always contains the correct value, so that when (and if) a register window 
overflow or underflow trap occurs, the register window can be correctly 
stored to or reloaded from memory. 


e User (non- “supervisor) code use SAVE and RESTORE instructions carefully. In 
particular, “walking” the call chain through the register windows using 
RESTOREs, expecting to be able to return to where one started using SAVEs 
does not work as one might suppose. This fails because the “next” register 
window (in the “SAVE direction”) is reserved for use by trap handlers. Since 
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- non-supervisor code cannot disable traps, a trap could write over the contents 
of a user register window which has “temporarily” been RESTORE’d. 


For example, if a routine at the fourth calling level returns to its caller at third 
level and restores the third-level window, an intervening trap at third level 
can change registers in the fourth-level window. A subsequent call and SAVE 
to a routine at fourth level will not find the register contents the same as they 
were on exit from the last fourth-level routine. 


The safe method is to flush the register windows out to user memory (the 
stack) in supervisor state using a software trap designed for that purpose. 
Then, user code can safely “walk” the call chain through user memory, instead 
of through the register windows. 


The rule-of-thumb which will avoid such problems is to consider all memory 
below %sp on the user’s stack, and the contents of all register windows “below” 
the current one to be volatile. Below means decreasing memory address and win- 
dow pointer, corresponding to call space of subsequent routines by the current 
routine. In embedded control applications complex enough to require partition- 
ing the process into re-usable tasks driven by a master sequencer, this view can be 
critical to ensure correct functioning in all cases. 


Global Registers 


Unlike the ins, locals, and outs, the globals are not part of any register window. The 
globals are a set of eight registers with global scope, like the register sets of more 
traditional processor architectures. The globals (except %g0) are conventionally 
assumed to be volatile across procedure calls. However, if they are used on a per- 

_ procedure basis and expected to be non-volatile across procedure calls, either the 
caller or the callee has to take responsibility for saving and restoring their con- 
tents. 


Global register %90 has a “hardwired” value of zero. It always reads as zero, and 
writes to it have no effect. 


The global registers other than %g0 can be used for temporaries, global variables, 
or global pointers—either user variables, or values maintained as part of the pro- 
gram’s execution environment. For example, one could use globals in the execu- 
tion environment by establishing a convention that global scalars are addressed 
via offsets from a global base register. In the general case, memory accessed at an 
arbitrary address requires two instructions, e.g.: 


sethi %thi(address), reg 
ld [reg+%lo (address)], reg 
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Use of a global base register for frequently accessed global values would provide 
faster (single-instruction) access to 2° bytes of those values, e.g.,: 


ils [sgntoffset], reg 


Global register n would hold the address of the center of a block of global values. 
The offset, varying from -4096 to 4095 bytes, would point to a particular value. 


The current convention is that the global registers (except %g0) are assumed to be 
volatile across procedure calls. The convention used by the SPARC Application 
Binary Interface (ABI) is that %gl is assumed to be volatile across procedure calls, 
%e2... %g4 are reserved for use by the application program (for example, as glo- 
bal register variables), and %g5 ... %g7 are assumed to be nonvolatile and 
reserved for (as-yet-undefined) use by the execution environment. 


5.3.2 Memory Stack 


Space on the memory stack, called a stack frame, is normally allocated for each 
procedure. Under certain conditions, optimization may enable a leaf procedure to 
use its caller’s stack frame instead of one of its own. In that case, the leaf proce- 
dure allocates no space of its own for a stack frame. The following description of 
the memory stack applies to all procedures, except leaf procedures which have 
been optimized as shown in 5.3.4. 


The following are always allocated at compile time in every procedure’s stack 
frame: 


¢ 16 words, always starting at %sp, for saving the procedure’s in and local 
registers, should a register window overflow occur. 


The following are allocated at compile time in the stack frames of non-leaf proce- 
dures: 


e One word, for passing a “hidden” (implicit) parameter. This is used when the 
caller is expecting the callee to return a data aggregate by value; the hidden 
word contains the address of stack space allocated (if any) by the caller for that 
purpose. See the section titled Functions Returning Aggregate Values. 


e Six words, into which the callee may store parameters that must be 
addressable. 


Space is allocated as needed in the stack frame for the following at compile time: 


e Outgoing parameters beyond the sixth. 


e All automatic arrays, automatic data aggregates, automatic scalars which must 
be addressable, and automatic scalars for which there is no room in registers. 


¢ Compiler-generated temporary values (typically when there are too many for 
the compiler to keep them all in registers). 
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Space can be allocated dynamically (at runtime) in the stack frame for the follow- 


ing: 


e Memory allocated using the alloca( ) function of the C library 


Addressable automatic variables on the stack are addressed with negative offsets 


relative to %fp; dynamically allocated space is addressed with positive offsets 


from the pointer returned by alloca( ); everything else in the stack frame is 
addressed with positive offsets relative to %sp. 


The stack pointer %sp must always be doubleword-aligned. This allows window 


overflow and underflow trap handlers to use the more efficient STD and LDD 


instructions to store and reload register windows. 


Figure 5-3 illustrates the stack frame of an active non-leaf procedure. 


%fp (old %sp) 
tip. = 


alloca() 


sp + 


Ssp + 


sp + 


sp t+ 


t8p 


%Sp —> 














— 





Space (if needed) for automatic arrays, aggregates, 
and addressable scalar automatics 


Space dynamically allocated via alloca (), if any 


Space (if needed) for compiler temporaries 


offset —»> 





— 


offset —> 


offset —> | Outgoing parameters past the sixth, if any 


6 words into which callee may store register 
arguments 


offset —> 


one-word hidden parameter (address at which callee 


Offset —»> 
should store aggregate return value) 


offset —» | 16 words in which to save register window (in and 


local registers) 


Stack Growth 
(decreasing memory addresses) 


Figure 5-3. User Stack Frame 


Previous Stack Frame 


Current Stack Frame 


Next Stack Frame 
(not yet allocated) 


5.3.3 Functions Returning Aggregate Values 


Some programming languages, including C, dialects of Pascal, and Modula-2, 


allow the user to define functions that return aggregate values. Examples include 
aC struct orunion, ora Pascal record. Since sucha value may not fit into the 
registers, another value-returning protocol must be defined to return the result in 


memory. 


Re-entrancy and efficiency considerations require that the memory used to hold 


such a return value be allocated by the function’s caller. The address of this mem- 


ory area is passed as the one-word hidden parameter mentioned in section 5.3.2 


“Memory Stack’, above. Where it is known that re-entrancy is not required, global 


Programming Considerations - Register and Stack Management 


5-18 


co 
FUJITSU 


or shared memory allocated by the master sequencer can be an effective alterna- 
tive, especially if the amount of memory required is small enough to be held in 
locked data cache. 


Because of the lack of type safety in the C language, a function should not assume 
that its caller is expecting an aggregate return value and has provided a valid 
memory address. Thus, some additional handshaking is required. 


When a procedure expecting an aggregate return value from a called function is 
compiled, an UNIMP instruction is placed after the delay-slot instruction follow- 
ing the CALL to the function in question. The immediate field in this UNIMP 
instruction contains the low-order twelve bits of the size (in bytes) of the area allo- 
cated by the caller for the aggregate value expected to be returned. 


When the aggregate-returning function is about to store its value in the memory 
allocated by its caller, it first tests for the presence of this UNIMP instruction in its 
caller’s instruction stream. If it is found, the callee assumes the hidden parameter 
to be valid, stores its return value at the given address, and returns control to the 
instruction following the caller’s UNIMP instruction. If the UNIMP instruction is 
not found, the hidden parameter is assumed not to be valid and no value is 
returned. 


On the other hand, if a scalar-returning function is called when an aggregate 
return value is expected (which is clearly a software error), the function returns as 
usual, executing the UNIMP instruction, which causes an unimplemented- 
instruction trap. 


5.3.4 Leaf Procedure Optimization 


A leaf procedure is one that is a “leaf” in the program’s call graph; that is, one that 
does not call (e.g. via CALL or JMPL) any other procedures. 


Each procedure, including leaf procedures, normally uses a SAVE instruction to 
allocate a stack frame and obtain a register window for itself, and a corresponding 
RESTORE instruction to de-allocate it. The time costs associated with this are: 


e Possible generation of register-window overflow/underflow traps at runtime. 
This only happens occasionally, but when either underflow or overflow does 
occur, it costs dozens of machine cycles to process. 


e The two cycles expended by the SAVE and RESTORE instructions themselves 


There are also space costs associated with this convention, the cumulative cache 
effects of which may not be negligible. The space costs include: 


e The space occupied on the stack by the procedure’s stack frame 
e The two words occupied by the SAVE and RESTORE instructions 


Of the above costs, the trap-processing cycles are typically the most significant. 
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Some leaf procedures can be made to operate without their own register window 
or stack frame, using their caller’s instead. This can be done when the candidate 
leaf procedure meets all of the following conditions: 


e Contains no references to %sp, except in its SAVE instruction 
e Contains no references to %fp 


e Refers to (or can be made to refer to) no more than 8 of the 32 integer registers, 
inclusive of %o7 (the “return address”). 


Such procedures can be converted into routines which share the caller’s stack 
frame and register window—an optimization that saves both time and space. 
When optimized, such a procedure is known as an optimized leaf procedure. It 
may only safely use registers that its caller already assumes to be volatile across a 
procedure call, namely, %o0 ... %05, %07, and %el. 


The optimization can be performed at the assembly-language level using the fol- 

lowing steps: 

e Change all references to registers in the procedure to registers that the caller 
assumes volatile across the call: 


e Leave references to %o7 unchanged. 
e Leave any references to %g0 ... %g7 unchanged. 


e Change % i0 ... % 15 to %00 ... %05, respectively. If an in register is changed 
to an out register that was already referenced in the original unoptimized 
version of the procedure, all original references to that out register must be 
changed to refer to an unused out or global register. 


e Change references to each local register into references to any register 
among %00 ... %o05 or %gl that remains unused. 


e Delete the SAVE instruction. If it was in a delay slot, replace it with a NOP 
instruction. If its destination register was not %g0 or %sp, convert the SAVE 
into the corresponding ADD instruction instead of deleting it. 


e Ifthe RESTORE’s implicit addition operation is used for a productive purpose 
(such as setting up the procedure’s return value), convert the RESTORE to the 
corresponding ADD instruction. Otherwise, the RESTORE is only used for 
stack and register-window de-allocation; replace it with a NOP instruction (it 
is probably in the delay slot of the RET, and so cannot be deleted). 


e Change the RET (return) synthetic instruction to RETL (return-from-leaf- 
procedure synthetic instruction). 


e Perform any optimizations newly made possible, such as combining 
instructions, or filling the delay slot of the RETL with a productive instruction. 


After the above changes, there should be no SAVE or RESTORE instructions, and 
no references to in or local registers in the procedure body. All original references 
to ins are now to outs. All other register references are to either %gl, or other outs. 
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Costs of optimizing leaf procedures in this way include: 


e Additional intelligence in the peephole optimizer to recognize and optimize 
candidate leaf procedures. 


e Additional intelligence in debuggers to properly report the call chain and the 
stack traceback for optimized leaf procedures. 


The following code fragment shows a simple procedure call with a value 
returned, and the procedure itself: 


! CALLER: 


! int: a; /* compiler assigns “i” to register %17 */ 
! i = sum3 (1, 2, 3 ); 


mov 1,%00 i firso arg to sums as. J 

mov 2y SOL ! second arg to sum3 is 2 

Gall sum3 ! the call to sum3 

mov 3, %o02 ! last parameter to sum3 in delay slot 

mov $00, S17 ! copy return value to 17 (variable “i”) 
#define SA (x) (( (x) +7) & (~0x07)) /* rounds “x” up to doubleword boundry */ 
#define MINFRAME ((16+1+6) *4) /* minimum size frame */ 

CALLEE 
Tie sum3 (a, b, c) 
int ay. ip. AC? /* args received in %10, %il, and %i2 */ 


return atbt+c; 


sum3 
save sp, ~SA(MINFRAME), %Ssp 'set up new tsp; alloc min. stack frame 
add $10, %1i1, %17 ! compute sum in local %17 
add Skig  olee SLI ! (or %10 could have been used directly) 
ret ! return from sum3, and... 


restore %17, 0, %00 move result into output reg & restore 


Since “sum3” does not call any other procedures (i.e., it isa “leaf” procedure), it 
can be optimized to become: 


sum3: 
add $00, %o0l1, %00 | 
retl ! (must use RETL, not RET, 
add $00, %02, %00 ! to return from leaf procedure) 


If a leaf routine is being created at the assembly level for use in an environment 
such as embedded control where all the caller routines are known, then a differ- 
ent approach can be taken. 
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Form a register map which identifies all of the in and local registers which contain 
information to be used by the leaf routine. Additionally, to accommodate the 
most restrictive of caller routines, identify those in and local registers which must 
be preserved for the caller. 


Initially attempt to write the leaf routine so that it changes only out and global 

registers, but uses information in the in and local registers. If the code requires | 
storing temporary values in memory and retrieving them later in the routine, or | 
regenerating a value in a register later in the routine because the register was 
overwritten to hold some other value, then examine the in and local registers to 
see if any of them can be changed by the leaf routine. 


If so, modify the routine appropriately. If not, or if after modification there is still 
temporary memory use or register value regeneration, try to relax the restrictions 
of caller routines by changing code to regenerate some of the variabies saved in 
registers. 

Usually leaf routines are associated with inner loops and are executed much more 
frequently than the routines that call them. Total program performance will be 
improved with the most efficient inner loops and leaf routines, even at the 
expense of less efficient outer-loop and set-up routines. 


The following short function code shows an example of a leaf routine written 
directly at the assembly level and satisfying the requirements for safe calling by 
other routines: 


/*RGB_I 
* 


*Convert red, green, blue pixel planes to intensity pixel plane: 
Ltd J) Paral a) DAB ig) er eC Cp yy 206 


Since there is no distinction between the i and j indexes as 
used by this process, the arrays can be accessed linearly with 
a Single index that runs through the total 512 by 512 pixel 
space. i= 511 -> 0, j= 511 -> 0. Hach pixel is one byte. 


+ ~*~ ££ + * F F 


*Inputs: base address 
base address 
base address 
base address 
pointer to Scalar Constant Array Base for a,b,c and other 
constants. 

Out pUES<:-YC15-)) 


wBDrK 


+ + F + 
QO 


*Time: 3932169 + 458753W cycles, 
* where W is number of wait states for DRAM access of data. 
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*REGISTER MAP: 


“a0 
3. 
case 
calla 
“44 
Lalo 
*16 
ca | 


Lx (070) ] 10 
[A(0,0))] 5 al 
[B (0,0): ] 12 
EGC0s.07 4 iS 

14 

bibs: 
ne 16 


general return 17 


co 
FUJITSU 


00. daAtbBtcC;, Y(a;4) gO 0 

ol A(i,j) Gl. “a 

OZ (Blea) gz. ib 

03 26407 35 g3 oc 

o4 bB, cC g4 

O05 - SIZ 7+ (2°LS=b-=s0)> -gs- f¥ (0;0) }41 
66 SP go 

o7 leaf return g7 SCAB 


*The following instructions take one cycle unless otherwise noted. 


a 


nate | aes fe 


sethi 256,%05 


sub S00) Lz sO0 
add o10;.15 095 
ldub [sg/+cnsta],%gl 
ldub [sq/t+tcnstb],%g2 


ldub [4G /Fenstc],3g35 


/xinner loop begin*/ 


Eis 


ldub [$i1+%05],%ol 
umul S01,%g1,%00 
ldub [$12+%05],%02 
umul 602,592, %04 
add $00, %04,%00 
ldub [$13+%05],%0o03 
umul %03,%93,%04 
add %00,%04,%00 
sra %00,8,%00 
subcc SOO}. LyeoOo 

bg sgt 

stb $00, [$g5+%S05] 


/xinner loop end*/ 


retl 
nop 


lpreset index to last pixel for fetch. 
'start at end & work toward beginning. 
'offset store base to compensate for 
'fetch index being ahead one pass 

fof store index 

'get weighting coefficients 

!1+W cycles for lst byte - cache miss. 
VL cycle -eacn. Lor rest = cacne: ALE. 


'fetch A. 1+W cycles for lst byte. 
'1 cycle for remaining 3 bytes in word. 
'2 cycles for byte multiplier 
!fetch B. 1+W/4 cycles. 

!2 cycles. 

lupdate accumulator 

'fetch C. 1+W/4 cycles. 

{2 cycles. 

‘update accumulator 

'scale sum of products to form Y 
'decrement & test index 

!loop if index >0 

'store Y using offset base since 
‘index has decremented. 

!'1+W cycles - always cache miss. 


'2 cycles 
lexit 


5.3.5 Register Allocation Within a Window 


The usual SPARC software convention is to allocate eight registers (%10-%17) for 
local values. A compiler could allocate more registers for local values at the 
expense of having fewer outs/ins available for argument passing. 


For example, if instead of assuming that the boundary between local values and 
input arguments is between r[23] and r[24] (%17 and %i0), software could by con- 
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Table 5-1: 


vention assume that the boundary is between r[25] and r[26] (%il and %i2). As 
illustrated in Table 5-1, this would provide 10 registers for local values and 6 
“in” /” out” registers. 


Alternative Register Allocation 


registers for local values 


“in’/’out” registers: 
reserved for %sp/“%fp 

reserved for return address 
available for arg passing 













Standard 
Register 
Model 


“10-Local” 
Register 
Model 













Arbitrary 
Register 
Model 












total “ins”/"outs” | 


5.3.6 Other Register and Window Usage Models 


In general-purpose computers, procedure calls are assumed to be frequent rela- 
tive to both context switches and User-Supervisor state transitions. A primary 
goal in these applications is to minimize total overhead, which includes time 
spent in both context switches and procedure calls. As more register windows are 
shared among competing processes, total procedure call time decreases (due to 
execution of fewer window overflow and underflow traps), while total context- 
switch time may increase (the average number of register windows saved during 
a context switch increases). The task is to strike a balance to minimize the sum of 
these two factors. | 


In embedded and/or real-time systems, the following factors are often more 
important than total overhead: 


e Minimal average context-switch time 

e A constant (or small worst-case deterministic) context-switch time 

e A constant (or small worst-case deterministic) procedure-call time 

In these cases, it can be worthwhile to use a different scheme for managing the 
SPARC register windows than the standard one described so far. This section pro- 
vides a few examples of modifications that can be made to the standard conven- 


tions. You can then design a register-usage scheme appropriate to the specific 
needs of your application. 


1. Divide the register file into “supervisor mode” register windows and “user 
mode” register windows. In cases where user/supervisor transitions are fre- 
quent, this will reduce register-window overflow and underflow overhead. 


To be effective in a workstation environment, where the coding style is charac- 
terized by deep nesting of procedure calls, such a scheme would require a 
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SPARC implementation with at least 14 windows in hardware (a minimum of 
7 for user code plus 7 for supervisor code). In embedded control, however, the 
nesting of procedure calls is typically shallow, and windows will be used 
more sparingly. 


2. Use multiple 1’s in the Window Invalid Mask Register (WIM) to partition the 
register file into groups of at least two registers each. Assign each group of 
registers to an executing task. This technique can be useful in real-time pro- 
cessing, where extremely fast context switches are desirable. A context switch 
would consist of loading a new stack pointer, resetting the CWP to the new 
task’s block of register windows, and saving and restoring whatever subset of 
the global registers is assumed to be nonvolatile. In particular, note that no 
window registers would need to be loaded or stored during a context switch. 





This technique assumes that only a few tasks are present, and, in the simplest 
case, that all tasks share a single address space. The number of hardware regis- 
ter windows required is a function of the number of windows reserved for the 
supervisor, the number of windows reserved for each task, and the number of 
tasks. Register windows could be allocated to tasks unequally, if appropriate. 


3. Avoid the normal register-window mechanism, by not using SAVE and 
RESTORE instructions. Software would effectively see 32 general-purpose 
registers instead of SPARC’s usual windowed register file. In this mode, 
SPARC would operate like processors with a more traditional flat register 
architecture. Procedure call times would be more deterministic (since there 
would be no window overflow or underflow traps), but for most types of soft- 
ware, average procedure call time would significantly increase, due to 
increased memory traffic for parameter passing and saving and restoring local 
variables. 


A number of existing SPARC compilers produce code using this register orga- 
nization. 


It would be awkward, at best, to attempt to mix (link) code using the SAVE/ 

RESTORE convention with code not using it in the same process. If both con- 
ventions were used in the same system, two versions of each library would be 
required. 


It would be possible to run user code with one register-usage convention and 
supervisor code with another. With sufficient intelligence in the supervisor, user 
processes with different register conventions could be run simultaneously. 





5.4 Cache Management 
Effective cache usage is based on the following principles: 


e Compactness of Code—Critical loops should fit entirely in the cache. They can 
then be locked into the cache to prevent their being displaced when other, less- 
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often-used routines are called. In some cases, it may be advisable to disable 
compiler in-lining optimizations in order to keep your code compact. 


e Program Profiling—Knowing where your program spends its time will help 
you decide what instructions and data to lock into cache. 


e Data and Instruction Locality—If possible, a large program or data set should be 
partitioned in such a way that one portion at a time can be locked into cache 
and used for a while before another portion needs to be loaded. For example, 
there are numerical routines which perform as many of their required 


computations as possible on one block of data before proceeding to the next 
block. 


5.5 Division Routines Using the DIVScc Instruction 


This section shows how integer division routines can be created using the DIVScc 
instruction. Signed and unsigned divisions are included for both word and dou- 
bleword dividends. The divisor is always a single word. These routines can serve 
as models for your own use of DIVScc, or they can be incorporated into your pro- 
grams and used without modification. These sample routines do not set the inte- 
ger condition codes in exactly the same way as the SPARC Version 8 integer 
division instructions. 


5.5.1 Simple Divide Step Examples 


In each of the following examples, a cycle by cycle view of divide step with 
reduced word size (3 bits) is given 


Register Use: 

outQO most significant half Dividend/ Remainder 

outl least significant half Dividend/ Quotient 

out2 Divisor 

Note: TS, True Sign = N xor V from condition codes 

Note: adjustment of negative quotient is also 
conditional on remainder. Details omitted 
here. See signed division example code. 


Examples of SIGNED division 


! 7/2 = +3 & +1 rmdr; 010-> 02, 111-> 01, 000-> 00 


ly ol TS ALUin ALUout 
mov S00, sy : msh dividend -> Y reg 
!000 111 . 
Est 300 ! initialize cc with sign dividend 
PV0.0: iti -g 
divsce %01,%02,%01 ! 0001-0010 1111 divide step l 
Pie oT jalio. <2 
divsce %01,%02,%01 ! 1111+0010 0001 divide step 2 
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'001 1]/01 0O 
divscce %01,%02,%01 ! 0011-0010 0001 divide step 3 
'001 011 0 
Est 300 ! dividend & quotient sign? 
'001 O11 0 
bl,a ee 
rool O11 
add %$01,1,%0 ! adjust quotient if negative from 
'001 O11 1's to 2's complement form 
li:mov %y,%o0 LOR OO) retrieve remainder 
!' -11/3 = -3 & -2 rmdr; 011-> 02, 101-> o1, 110-> 00 
ly ol TS ALUin ALUout 
mov S00, SY ! msh dividend -> Y reg 
P10.) SLO 
tst $00 ! initialize cc with sign dividend 
VAL O* <2 Oa. “el 
divsce %01,%02,%01 ! 11014+0011 0000 divide step 1 
'000 Ojf11 0 
divscc %01,%02,%01 ! 0000-0011 1101 divide step 2 
'101 1110 1 
divsce %o01,%02,%01 ! 1011+0011 1110 divide step 3 
'110 100 4. 
tst 00 | dividend & quotient sign? 
'110 100 1 
bl,a 1f 
P10 200 
add SOL) Ly ! 100+001 Lod adjust quotient if negative from 
'110 101 1's to 2's complement form 
l:mov %y,%o0 1110 -> 00 retrieve remainder 
Examples of UNSIGNED division 
! 11/3 = 3 & 2 rmdr; 011-> 02, O11-> ol, 001-> 00 
ly oub TS ALUin ALUout 
mov 600, sy ! msh dividend -> Y reg 
POOL) O17 
Est %g0 : initialize cc as non negative 
'001 Oj11 O dividend 
divscc %o01,%o ol ! OOLO=00EE> altel divide step 1 
PP GPE. 4 
divsce %01,%0 Ol. 1 1111+0011 0010 divide step 2 
POLO: -LyjoL 0 
divsce s01,%62,;%01. |! 0101-0011 0010 divide step 3 
POLQ: Qa. 0 TS is last remainder sign 
mov Sy, 300 1010 -> o0 retrieve remainder 
es 
| reg o0 
POLO" “O11 0 
bl,a ee 
'010 011 
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add 300, %02, 00 ! 


1010 
i2nep 
! 33/5 = 6 & 3 rmdr; 
ly 
mov 500,%Y 
!100 
cSt %g0 ! 
'100 
divsce %01,%02,%01 ! 
!011 
divsce %01,%02,%01  ! 
1001 
divsec %o1,%02,%01 ! 
eae) 
mov Sy, %O0 ! 
=———s 
! reg o0 
1110 
bly a lf 
110 
add $00,%02,%00 ! 
!011 
Lop 


O11 


101-> 02, O001-> ol, 
ol TS ALUin 
O01 
O|O1 0 

1000-0101 
Oasis 30 

0110-0101 
Lis -a0 

0011-0101 
110 il 

110 -> 00 

tekQ At 
120 

1104101 
110 


LOO S200 
ALUout 


0011 


0001 


1410 


Ca: 


adjust remainder if negative 


msh dividend -> Y reg 

initialize cc as non negative 
dividend 

divide step 1 

divide step 2 

divide step 3 


TS is last remainder sign 
retrieve remainder 


adjust remainder if negative 


5.5.2 Signed Division with Doubleword Dividend (divs2) 


This subroutine for signed division of a 64-bit dividend by a 32-bit divisor pro- 
duces a 32-bit signed quotient and a 32-bit remainder. Special treatment is given 
to borderline overflow when the absolute value of the quotient is 2°!, in order to 
support the math operator INTEGER PART OF: Q=-2°! does not overflow; 
Q=+2°! overflows with a special overflow code. 


Remainder is zero if the division is exact; otherwise, the remainder is the same 
sign as original dividend. There is a check for divide by zero and a check for over- 
flow with non-zero divisor. The check for divide by zero is kept separate to sup- 
port the SPARC-recommended trap for divide by zero. In applications where the 
user knows the numerical ranges of the operands, or controls them, these checks 
can be omitted. Division with divide by zero fault takes 6 cycles, sets the overflow 
flag in the integer condition code, and leaves Oxfffff800 in register out3. 


Division with non-zero divisor overflow takes 17 to 23 cycles (17 or 19 if the origi- 
nal dividend is positive, 18 or 23 if the original dividend is negative); it sets the 
overflow flag in the integer condition code, and leaves 0x800 in register out3. 


Division leading to a quotient of absolute value 2°! takes 20 cycles if the original 
dividend is positive, and 23 cycles if the original dividend is negative. It leaves 
the correct remainder in register out0, -2°! in out] as quotient and 0 in out3. It 


clears the overflow condition code if the actual quotient is 
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731 and sets the over- 


flow condition code if the actual quotient is +29". 


Division without fault takes 49 to 60 cycles; it clears the overflow condition code, 
and leaves 0 in register out3. Exact division with last partial remainder = 0 takes 
49 cycles. Exact division with last partial remainder = tdivisor, as happens with 
non-restoring division algorithms, takes 53 or 54 cycles. Inexact division, with 
non-zero final remainder, takes 56 to 60 cycles. 





'Calling Convention 
! mov 610, %00 'msh dvdnd->o0 
! mov $611,301 'ish dvdnd->ol 
! call divs2 'DIVISION SUBROUTINE CALL 
i Orec *6qU; 612,302 'dvsr->o02 & test 
'Register Map 
! reg# 
! outdo msh dividend/remainder 
b soutd lsh dividend/quotient 
my “OUEZ divisor 
“Out. overflow indication 
I overflow divide by zero/0Oxfffff800 and V=1 
! overflow divide by non-zero/0x800 and V=1 
! overflow quotient =+2°31/0 and V=1 
! no overflow/0 and V=0 
! ~—out4 scratch for final remainder calculations 
l’ (OME S absolute value of divisor 
!oy msh dividend/successive partial remainders 
! call to divs2 must be made with cc indicating sign of divisor 
global. divsZ 
divs2: bne Of !'go on if divisor not zero 
mov $02,%05 CODY: Givaeor- 177-5). 
sethi Oxlfffff,%03 'divide by zero indicator 
retl lexit with 
addcc %03,%03,%03 'overflow set 
O° bl,a ia 
sub %g0,%05, %05 !if divsr neg, D=-divsr 
18 mov %00,SYy 'msh dvdnd->yY 
tst SOQ finitialize cc for first divide step 
!'with sign dividend for signed divide 
bi 2f 'skip ahead for negative dividend 
DIVSCC. (9,-0xd;9) 'divide step 1 


f'equivalent to divscc %01,%05,%ol 
'don’t change cc except by DIVSCC until last divide step done 
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bl 35 

mov $g0,%03 

srl S01,1,%04 

bg 8f 

subcc %04,%05,%g0 

bge 8f£ 

sethi 0x200000,%o1 

tst SO2 

bg,a OF 

addcc %01,%01,%g0 
oO; retl 

mov $04, %00 
oa sethi 0x200001,%03 

ret 

addcc %03,%03,%03 
23 bge Si 

mov eg0;, 503 

mov SY, 600 

add¢e- 260;1, 500 

bne 8f 

srl $01,1,%04 

sethi 0x200000,%o01 

OF $01, %04, %04 

addcc %04, %05,%g0 

ble Sf 

CSc S02 

bila Of 

addcc #01, %01,%9g0 
9; retl 

mov 504, %00 
os sethi 0x200001,%03 

retl 

addec %03,%03,%03 
3 DIVSCC1S; Oxd, 9) 


DIVSCC( 9, 0xd,.9) 


DLVSCC.(97,0xd, 9) 
DIVSCC (9;,0xd;-9) 


be 


mov 
bg 
addcc 
be,a 


of 


SY, %O4 

4f 
604,505, 6g0 
of 





lok if different 

!'clear overflow indicator 

'get lsh rmdr 

!if msh rmdr >0 then overflow 

'if lsh rmdr <D then Q is +/-2%31 
!'& o4 is correct final rmdr 

'check if overflow on Q = +2%31 
'set -2%°31 -> Q 

l!else overflow 

!if original divisor >0 

'which implies quotient =+2%31 
'set ovrlfw cc with o3 = 0 

lexit 

!'with correct remainder in o0 
l!overflow divide by non-zero indicator 
lexit with 

l!overflow set 

'ok if different 

!'clear overflow indicator 

'get msh rmdr 

Pie) ae SS] 

lif <-1 then overflow 

'get lsh rmdr except for leading 1 
hem 22°34 *=50 

!insert leading 1 in lsh rmdr 

bit Lesh .emdrsS<D: bhen vg 1S /=2°37 
'€ 04 is correct final rmdr 

!'check if overflow on Q = +2%°31 
l'else overflow 

!if original divisor <0 

'which implies quotient =+2%31 
'set ovrlfw cc with 03 = 0 

lexit 

!'with correct remainder in 00 
loverflow divide by non-zero indicator 
fexit with 

!'overflow set 

'divide step 2 

'divide step 3 


'divide step 32 


lif final remainder is zero, 

'go fix quotient polarity 

!'final remainder from Y to o4 

'skip ahead if rmdr+; continue if rmdr- 
lis neg rmdr + abs divsr =0 

!if so, go fix quotient polarity and 
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Mov 6gG0,%04 'clear rmdr. if not, don’t clear 
tst 600 'test original dvdnd 
bl 5 'if neg, go check neg Q 
tst SOL 'sign Q 
ba SE 
add 604,%605, 604 'if orig dvdnd pos and final rmdr neg, 
'correct rmdr; then go check neg Q 
4; subcc %04,%05, %g0 'is pos rmdr - abs divsr =0 
be,a 6f 'if so, go fix quotient polarity and 
mov 6G0,%04 'clear rmdr. if not, don’t clear 
tst S00 'test original dvdnd 
bge one 'i1f pos, go check neg Q 
tst SOL 'sign Q 
sub 604,605, 704 'if orig dvdnd neg and final rmdr pos, 
correct -rmdr; then go check. neg OQ 
5: bisa 6f 'skip ahead if Q pos 
add MOL yk ao L 'if neg Q, 1’s complement to 
!2’'s complement; annul if pos Q 
6: tst S02 'check original divisor sign 
bl,a 7£ 
sub 6g0,%01,%01 'if neg divsr, negate quotient 
Y retl lexit 
Mov 304, %00 'with correct remainder in o0 


5.5.3 Signed Division with Word Dividend (divs1) 


This subroutine for signed division of a 32-bit dividend by a 32-bit divisor pro- 
duces a 32-bit signed quotient and a 32-bit remainder. Remainder is zero if the 
division is exact; otherwise the remainder is the same sign as the original divi- 
dend. There is no check for divide by zero. It is not possible to overflow with non- 
zero divisor. If the calling routine knows that divide by zero cannot happen, no 
test is needed. If divide by zero is possible, a simple test just after the call can 
abort the division. | 


Division without fault takes 47 to 58 cycles. Exact division with last partial 
remainder = 0 takes 47 cycles. Exact division with last partial remainder = 
+divisor, as happens with non-restoring division algorithms, takes 51 or 52 cycles. 
Inexact division, with non-zero final remainder, takes 54 to 58 cycles. 


'Calling Convention 


! Mov $11,%00 !dvdnd->o00 

! orce $g0,%12,%02 'dvsr->02 & test 

! Cad divsl 'DIVISION SUBROUTINE CALL 

! be dvby0 fabort division if divide by zero 


'Register Map 
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| reg# 
! outod 
! Outrd 
! out2 
! out4 
! outd 


y 
! call to 


-global divsl 

divsl: mov 
mov 
bl,a 
sub 

Ls tst 


bl,a 
mov 
2 DIVSCC 


DIVSCC 


DIVSCC 
DIVSCC 


DIVSCC 
DIVSCC 


be 
mov 
bg 
addcc 
be,a 
mov 
tst 
DL 
ESor 
ba 
add 


4: subcc 
be,a 
mov 
tst 


dividend/remainder 
quotient 
divisor 
scratch for final remainder calculations 
absolute value of divisor 
initially sign extension of dividend/ 
successive partial remainders 
divsl must be made with cc indicating sign of divisor 


6g0,%y 10 -> Y 

$02,%05 hCopy divisor in os, D 

Le 

%g90,%05,%05 !if divsr neg, D=-divsr 

%O0 finitialize cc for first divide step with 
'sign dividend for signed divide 

Z2£ 

=1,.3Y '-] -> Y only if dvdnd neg 

(8, Oxd, 9) !divide step 1 
f'equivalent to divscec %00,%05,%ol1 
!leave original dividend in o0 
'do partial remainders & quotient in ol 
'don’t change cc except by DIVSCC until 
'last divide step is done 

(9, 0xd, 9) 'divide step 2 
'egquivalent to divscc %o01,%05,%00 

(9,0xd,9) !divide step 3 

(9, 0xd, 9) !'divide step 4 

(9, Oxd, 9) 

(9, 0xd, 9) 'divide step 32 

6f 'if final remainder =0, go fix quotient polarity 

SY, %O4 'final remainder from Y to 04 

4f 'skip ahead if rmdr+; continue if rmdr- 

$04,%05,%g0 !is neg rmdr + abs divsr =0 

6f lif so, go fix quotient polarity and 

$g0,%04 !clear rmdr. if not, don’t clear 

%00 !test original dvdnd 

oni !if neg, go check neg Q 

SOL PSigii 0 

OL 


$04,%05,%04 !if orig dvdnd pos and final rmdr neg, 
'correct rmdr; then go check neg Q 


%04,%05,%g0 !'is pos rmdr ~- abs divsr =0 

6f 'if so, go fix quotient polarity and 
390,%04 'clear rmdr. if not, don’t clear 

%00 'test original dvdnd 
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bge 5f 'if pos, go check neg Q 
tst SO1 'sign Q 
sub $04,%05,%04 !if orig dvdnd neg and final rmdr pos, 
'correct rmdr; then go check neg Q 
5 Baya 6f 'skip ahead if Q pos 
add SOL peo 'if neg Q, 1’s complement to 
'2’s complement; annul if pos Q 
6: tst S02 'check original divisor sign 
bl,a fee 
sub $g0,%01,%01 !if neg divsr, negate quotient 
ee retl lexit 
Mov 504,500 'with correct remainder in o0 


5.5.4 Unsigned Division with Doubleword Dividend 
(divu2) 


This subroutine for unsigned division of a 64-bit dividend by a 32-bit divisor pro- 
duces a 32-bit unsigned quotient and a 32-bit remainder. Remainder is zero if the 
division is exact, and positive otherwise. There is a check for divide by zero and a 
check for overflow with non-zero divisor. The check for divide by zero is kept 
separate in order to support the SPARC-recommended trap for divide by zero. In 
applications where the user knows the numerical ranges of the operands, or con- 
trols them, these checks can be omitted. 


Division with divide by zero fault takes 6 cycles; it sets the overflow flag in the 
integer condition code, and leaves Oxfffff800 in register out3. Division with a non- 
zero divisor overflow takes 9 cycles; it sets the overflow flag and leaves 0x800 in 
register out3. Division without fault takes 42 cycles, clears the overflow flag, and 
leaves 0 in register out3. 


'Calling Convention 


! mov 510, %00 'msh dvdnd->o0 

! mov $11,%01 fish dvdnd->ol 

! call divu2 !DIVISION SUBROUTINE CALL 
! -orec:- S0q0,) 312; 402 ldvsr->o2 & test 


'Register Map 


| reg# 

! outO msh dividend/remainder 

! outil lsh dividend/quotient 

! out2 divisor 

! out3 overflow indication 

! overflow divide by zero/Oxfffff800 and V=1 
! overflow divide by non-zero/0x800 and V=1 
! no overflow/0 and V=0 
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Loy msh dividend/successive partial remainders 
! call to divs2 must be made with cc indicating if divisor zero 


global divu2 


divu2: bne cE lgo.0on tf divisor not: zero 
mov 300, %Y 'msh dvdnd->yY 
sethi Oxlfffff,%03 !divide by zero indicator 
retl lexit with 
addcc %03,%03,%03 loverflow set 
ile subcc %00,%02,%g0 'i3s msh dvdnd < dvsr 
bes Zt 'ok if so 
orce %g0,0,%03 finitialize cc for first divide step 


'with positive sign for unsigned divide 
'clear overflow indicator 


sethi 0x200001,%03 !overflow divide by non-zero indicator 
retl exit with 
addcc %03,%03,%03 loverflow set 

ie DIVSCC (9, 0xa, 9) 'divide step l 


'equivalent to divscc %01,%02,%01 
'don’t change cc except by DIVSCC until 
flast divide step is done 

DIVSCC (9,0xa,9Q9) 'divide step 2 

DIVSCC (9,0xa,9) 'divide step 3 


DIVSCC. (9, 0a; 9) 


DIVSCC (9, 0xa, 9) 'divide step 32 
bl a 'skip ahead if rmdr- 
mov SY, 600 !final remdr from Y to o0 
ret 1 lexit 
addcc %00,0,%00 'clear ovrflw cc if on 
33 retl fexit 
addcc %00,%02,%00 !correct rmdr & clear ovrflw cc if on 


5.5.5 Unsigned Division with Word Dividend (divu1) 


This subroutine for unsigned division of a 32-bit dividend by a 32-bit divisor pro- 
duces a 32-bit unsigned quotient and a 32-bit remainder. Remainder is zero if the 
division is exact, and positive otherwise. There is no check for divide by zero. It is 
not possible to overflow with non zero divisor. If the calling routine knows that 
divide by zero cannot happen, no test is needed. If divide by zero is possible, a 
simple test just after the call can abort the division. 


If not aborted, the division takes 39 cycles; it clears overflow flag and leaves 0 in 
register out3. If the remainder is of no interest and only the quotient correspond- 
ing to INTEGER(dvdnd/dvsr) or FLOOR(dvdnd/dvsr) for unsigned numbers is 
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wanted, then the last steps of this routine can be modified as indicated. Quotient- 
only unsigned division takes 36 cycles. 


'Calling Convention 


! mov 
b OrGe 
' call 
! be 


!'Register Map 


! reg# 
' out0O 
ae oS oh wa 
Pout 
YOU S 


ry 


$11,%01 
$g0,%12, $02 


divul 
dvby0 


remainder 
dividend/quotient 
divisor 


Lavan. Oi, 

ldvsr->o02 & test 

IDIVISION SUBROUTINE CALL 

'abort division if divide by zero 


0 if divide by non zero 


zero/successive partial remainders 


-global divul 


adivul= 


mov 
Orcce 


BEVSCC 


DIVSCCG 
DIVSCC 


DLVSee 
DIVSC? 
retl 

DIVeCe 


oe 


x 


SgO, 
% ,6O3 


gO, 


Oo 


(9) 0xai79) 


(9,0xa, 9) 
(9,0xa,9) 


(9,0xa, 9) 
(9,0xa, 9) 


(97 Oxa 72) 


10->Y 

linitialize cc for first divide step 
lwith positive sign for unsigned divide 
'clear divide by zero indicator 

!'divide step 1 

l'eguivalent to divscc %01,%02,%01 
'don’t change cc except by DIVSCC until 
!last divide step is done 

'divide step 2 

!'divide step 3 


'divide step 31 
lexit for quotient-only divide 
'divide step 32 


!ALL the following steps may be omitted for quotient-only divide 


bl 
mov 
retl 
addcc 
retl 
addcc 


ae 


° 


SY, 600 
$00,0,%00 


%00,%02, S00 


'skip ahead if rmdr- 

'final rmdr from Y to o0 

lexit 

'clear ovrflw cc if on 

lexit 

'correct rmdr & clear ovrflw cc if on 
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5.5.6 Divide Step In Support Of A To D Converter 
Compensation 


The following code fragment shows compensation for errors in quantization 
codes of an analog to digital converter that has been calibrated with the Walsh 
Transform techniques developed at Schlumberger (Fairchild) Test Systems. Refer 
to “A System For Converter Testing Using Walsh Transform Techniques” by E.A. 
Sloane presented as paper 11.3 at the IEEE International Test Conference, October 
1981. 


As the paper shows, for well designed and manufactured analog to digital con- 
verters, the relation between codes and actual voltage values of the mid point of 
each quantization bin is as close to linear as technology and economics permit. So 
the power of two order Walsh coefficients dominate over the cross terms. Conse- 
quently, this example only uses the quantization bits as is and doesn't cover the 
exclusive or combinations between some of the more significant bits. For each bit 
of additional accuracy, only another instruction pair of add & set condition codes 
and divide step is required. To do this with table lookup would require doubling 
the table size, consuming data cache. Simple gain and offset corrections based on 
least square linear fit don't offer as much accuracy and usually are based on static 
rather than dynamic tests, which are more suited to actual use. 


The operation shown in the code fragment is: 


Yreg = +2? xA9+2°xA8...+2°x AO 


At each stage whether the next term is added or subtracted depends on whether 
the corresponding bit of quantization in a register pointed to by symbol x is 0/1. 


mov 0, Sy 'clear Yreg 

addcc: x 7a, x 'left shift code from upper bits of register x 
'with msb setting N & V to force true sign 

divsce %g0,A9, %g0 'only add or subtract immediate value to Yreg 


Ino other register is affected 
addec: x,x%;x 
divsce %g0,A8,%g0 
aqdqce: 2) 3)-% 
divsce %g0,A7,%g0 
addce:. Xe x. x 
divscc %g0,A6, sg0 
BAOdCe. pts 
divscc %g0,A5,%g0 
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addcc. 2.x ).% 

divsce %g0,A4,%g0 

addce. x) <,z 

divsce %90,A3,%g0 

adace . Kye rs 

divscce %g0,A2,%g0 

addcc xX,xX,X 

divscce %g0,Al1,%g0 

addcc xX,xX,xX 

divscc %g0,A0,%g0 

mov Sy yoo L 'gl holds compensated value of quantization code 
! from x scaled by a factor chosen to make most 
! use of the 13 bit precision available for 
! immediate values. 
! here with 10 bits, results are scaled by 2%9 
! relative to coefficients. 


As an example, a 10 bit offset binary analog to digital converter might be set to 
operate over a range of -5.12 to +5.12 volts with nominal 10 millivolt quantization 
resolution. If ideal, with no errors, the coefficients for each bit expressed as milli- 
volts would be: 


m 8 7 6 5 4 3 2 1 0 
a(m) -2560 -1280 -640 -320 -160 -80 -40 -20 -10 -5 


If the process technology is limited to + 0.5% accuracy of the converter's resistive 
ladder, then the actual coefficients for each bit in millivolts could be: 


m 9 8 7 6 5 4 3 2 1 0 
a(m) -2572.59 -1274.24 -642.94 -319.97 -159.87 -80.34 -39.86 -20.02 -10.05 -4.98 


These coefficients would be scaled by 27™, corresponding to the order of entering 


Yreg which gets left shifted each time, and rounded to integer. 


m 9 8 fi 6 5 4 3 2 1 0 
A(m) -2573 -2548 -2572 -2560 -2558 -2571 — -2551 -2563 -2572 -2547 


Driving the analog to digital converter with a 4.000 Volts, 5 MHz sine wave, sam- 
pling at 64 MHz and collecting 64 consecutive samples allows performing spec- 
trum analysis with FFT to determine effective bits under the test conditions. 
Because of the sine wave frequency relative to the sample frequency, the signifi- 
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cant distortion harmonics don't alias into the fundamental frequency analysis bin. 
Number of effective bits is approximately: 


power spectrum at fundamental 
sum of power spectrum at all other freqencies 


log (2) 


0.5 x log (5 


The nominal 10 bit converter with ideal coefficients at each code bit shows 9.52 
effective bits under dynamic rather than static testing. The converter with + 0.5% 
errors in the resistive ladder taken at nominal value without Walsh based calibra- 
tion shows 7.57 effective bits. With Walsh base calibration, it shows 9.05 effective 
bits. A least square straight line fit for compensation shows only 7.57 effective bits 


1 Pega Wag tr a a Ga a et ee ee 1 
but with reduced error in measuring peak amplitude. 


This less obvious use of divide step allows fast compensation for an appropriately 
calibrated analog to digital converter. Recovery for this example of about 3/4 of 
the lost number of effective bits at the price of two cycles per quantization bit plus 
2 cycles overhead. 


5.6 Using the SCAN Instruction 


The code examples in this section illustrate the use of the SCAN instruction. In the 
first example, SCAN is used to simplify and speed up floating-point normaliza- 
tion. 


5.6.1 Scan in Support of Software Floating Point 


The following code fragment shows post normalization of floating point add or 
subtract for the case where the result requires calculating the difference of the 
magnitudes of the numbers. The IEEE754 format, which is used in SPARC archi- 
tecture as well, is assumed. This uses sign, offset exponent, hidden leading bit 
when normalized and fraction. Only the logic of normalize numbers is shown 
here. Number values are in sign and magnitude form rather than two's comple- 
ment. 


31 30 23 22 0 
normalized values 
X = -18 x 26127 6 (146 x 2°23) 


The operation is x+y=z or x-y=z. If subtract, then sign y is complemented. The 
magnitudes of the numbers have to be compared and the one with the lesser 
exponent right shifted to align its decimal point with the greater. If exponents are 
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equal, magnitudes must be compared if signs differ to see what the sign of the 
result will be. This is assumed to have taken place before the code fragment 
shown here, which shows the logic of handling numbers with different signs and 
different exponents. Symbol x points to the larger number; y to smaller. 


sethi 


sll 
xOr 
srl 
and 
Sr. 
and 
sub 
andn 
OF 
srl 
sub 
sll 
addcc 


andn 
OL 
subx 


subcc 
bl 
sub 


O47 UX iy oe 
EOE COS 
Go, UxXEL, gs 


cP oll kG 


393,%gG4,%g3 
693,%g1,%g2 
$g0,%g1, gl 
BG3; 691; 6G3 
%g3,%g93, %g0 


6g1,%g4,%sg1 
$g1,%g92,%g1 


692,32, %g0 
Lae 
6gG2,8,6g2 


‘mask for sign and exponent with and 
lor for fraction with andn 


'single one at bit 23 for hidden bit 
lx exponent 


'y exponent 

'alignment difference 

ly frackaon 

'y hidden bit 

‘downshift y magnitude to g2 
'complement of shift 

'upshift left over y for test 
'test left over for rounding 
‘note: not IEBE754 rounding here 
isc “LracrLon 

'x hidden bit 

'difference of magnitudes with 
'simple rounding 


'scan difference for leading one. 

'Use of 0 as the scan mask is because 
'of sign magnitude arithmetic assumed 
'in this example. Leading 8 bits are 
'guaranteed to be zero because of 
'format. Question is, how many more 
'cill the first one? 

'Tf two's complement arithmetic had 
'been assumed, then there could have 
‘been leading ones or leading zeros 
'depending on sign of result. Then 
'instead of 0 as mask, scan would have 
'used sgl as mask as well as value. 
‘Question would have been, how many 
'leading bits are the same as the sign? 
ftrest if all significant bits lost 


!'remove effect of format's 8 leading O's 


‘underflow due to loss of significant bits code would follow here 
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i es sll $g1,%g2,%sg1 !Inormalize result 
andn %g1,%g4,%gl1 'hide leading bit 
srl x,.234 693 
and %g3,0xff,%g4 'x exponent in g4 
subcce %9g4, %g2, %g0 'test exponent underflow 
bgf Ze 
sub $93,%9g2,%gq3 'subtract normalization shift from 


'result sign and exponent 
'exponent underflow code would follow here 


a3 sll %93,23,%93 'place sign and exponent result in 
!format position 
retl lexit (2 cycles) 
or SQ 154 8G 3,2 'combine with fraction 


Each instruction in this code fragment runs one cycle out of instruction cache 
except for the leaf return which takes two. That's 32 cycles for this fragment. 
Without scan as a hardware instruction, the function would have to be performed 
as a software routine that takes 43 to 52 cycles for usual cases. The fragment 
would take 74 to 83 cycles, more than double. A software substitute for scan 
would consume instruction cache space. Attempts to speed up the binary tree 
search in the software routine by look-up tables based on leading bits would con- 
sume data cache space. | 


5.6.2 Scan in Support of Run Length Encoding 


The following code fragment shows compression of long binary strings by look- 
ing for runs of all ones or all zeros and coding these so that lossless reconstruction 
is possible. For the example, runs less than four in length are ignored and directly 
transmitted and runs greater than sixteen are broken up for coding efficiency and 
coding simplification. Best compression occurs for low information content long 
binary strings such as background sections of black and white raster lines. 


code value 
00000 reserved 
00001 es 

00010 " 

00011 m 


00100 OOOO). «3. Or tO c. 


COLOT OOO00 OL «6%. Or EEE Oo: 

00110 DOOD ON raze o. SOR A EO ae 5 

3 i BE C000: -CO00) 0000 0002.02 .. Or MIP abi Pi el oO 43 
10000 0000-0000 ‘00000000. Tacs. Or VET Aaa. Vad. la’ Dees 
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The code fragment omits starting up the loop, reloading buffers with new data, 
storing code and terminating the loop. Symbol x points to data segment in some 
register ready for compression and symbol y points to its immediate successor. 


0 scan 


subcc 
bge 
SubCcC 


‘handle fixed length code, 


srl 
or 
sll 
addcc 
bcs 
addcc 
bcs 
mov 
ba 
mov 


2 bcc 
mov 
mov 


33 srl 
or 
sli 
ba 
subcc 


x, Xk, oC 


S6g1,4,%g0 
iis 
6G iy 1:6 ;og0 


x2 8; eZ 
692,16; 6g2 
Ripa 

0 ee am 

pane 

x) x, og0 

SE 

1,%q4 

Sf 

0,%g4 


of 
1,%g4 


V7, 25,93 
RoGoy 
y,4,y 

ot 
$95,4,%g5 


‘handle run length code 


is lowe 
sll 


AE 
6g4,1,%g4 


'scan for how many bits are same as ms 
'gl = 1 to 31 or 63 if all in x regist 


low 
ers 


'xy 1s used as both the value to be scanned(rs1) 


'and the mask(rs2). 

'test if run at least length 4 

'test if run greater than length 16 
g1<4 

'extract leading 4 bits of x as compre 

finsert leading bit of code for fixed 


'shift rest of x in 2 steps 
'complete x shift and test last 
!separate cases for 1 or 0 
'test without shifting first of 
'if last out bit =0 and first remainin 
'set new low priority toggle indicator 


lotherwise clear toggle indicator 


ssion code 
length 


of 4 bits outgoing 


remaining bits 


g bit =1 


'fixed length code overwrites any pending toggle 


'1f last out bit =1 and first remaingi 
'set new low priority toggle indicator 
'otherwise clear toggle indicator 


ng bit =0 


!'fixed length code overwrites any pending toggle 


lextract leading 4 bits of y 
'move them to right end of x 
'shift rest of y with incomming traili 


'decrement counter of how many bits of 


'skip ahead if run less than 16 
'shift incomming toggle indic. to high 
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‘handle runs at least 16 


mov 16,%g2 !set compression code to 16 
sll x, 16; !ignore leading 16 bits of x and shift rest of x 
Sr, V6, 393 lextract leading 16 bits of y 
or X93 5x Imove them to right end of x 
sll y,16,y 'shift rest of y with incomming trailing zeros 
ba 5f 
subcc %g5,16,%g5 'decrement counter of how many bits of x left 
'handle runs of length 4 to 15 
4: mov SG1,%g2 !'set compression code to scan result 
sub $g0,%g1,%g1l !complement scan result . 
sll x, t9G2,X% !ignore leading g2 bits of x and shift rest of x 
srl y, gl, sg3 lextract leading 32-gl bits of y 
or x; S95, X 'move them to right end of x 
sll VY, %92,Y 'shift rest of y with incomming trailing zeros 
subec %95,%q92, %g5 'decrement counter of how many bits of x left 
or $g4,1,%g4 !toggle following compression code too 
'one compression code to go 
D3 bg 6f !'skip ahead if there are still bits of x left 
subcce %g6,1,%g6 'decrement counter of code fields left 


'code for reloading y and shifting part of it into x if the old y had 
'trailing zeros and resetting g5 to 32-#trailing zeros. 


6: bg TE 'skip ahead if room for more codes 
andcc %g4,2,%g0 ltest if toggle has priority 
'code for storing codes and reinitializing g6 


as sll Fo i ge 'make room for new code 
be,a Ob lif g4 bitl off then no additional code 
!if g4 bitl on then insert toggle code first 
or Z,6G2;,Z linsert new data code 
andn %g4,2,%g4 !'clear high priority toggle indicator 
!without disturbing low priority toggle indicator 
ba 5b !'check how much code space left and append toggle 
Or 2 OSE oe 'back through 5,6,7 just once 


Each instruction in this code fragment runs one cycle out of instruction cache if it 
is in the active path for a particular case. Scan is in the active path for all cases. 
Without hardware implementation of scan, the function would require a software 
subroutine taking 43 to 52 cycles instead of 1 cycle. Additionally, that routine 
would consume instruction cache space. Alternate versions that might attempt to 
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speed up the binary tree search with table look-up using leading bits as an index 
would consume data cache space. 


5.7 Multiply Routines Using the MULScc Instruction 


This section shows examples of doing integer multiplication using the multiply 
step instruction. With hardware implementation of multiply in SPARClite, these 
routines are not required for usual situations. However, these examples illustrate 
how MULScc works and may serve as models for use in unusual situations. 


These sample routines do not set the integer condition codes in exactly the same 
way as SMULcc and UMULcc Version 8 integer multiplication. 





5.7.1 Simple Multiply Step Examples 


In each of the following examples a cycle by cycle view of multiply step is given. 


Multiply Step With Reduced Word Size (32 to 3 Bits) 


Register Use: 

out0 Multiplier 

outl Multiplicand 

out2 most significant half Product 

out3 least significant half Product 

Note: TS, True Sign = N xor V from condition codes 


a ee ee ee 


Examples of SIGNED multiplication 


! 2 Se 67. O10: => ol, OLE. => 00 


! o2 Y TS ALUin ALUout 

mov 600, SY ! multiplier -> Y reg 
! O11 

andcc %g0,0,%02 ! clear product accumulator & cc 
!00 10 01/1 0O 

mulscc %02,%01,%02 ! 000+010 010 active multiply step 1 
10110 00/1 0 

mulscc %02,%01,%02 ! OO1+01L0 - O1L1L active multiply step 2 
!QO1]1 00;0 O 

mulscce %02,%01,%02 ! 001+000 O01 active multiply step 3 
1O00{1 10/0 0 

mulscce %02,0,%02 ! 0004+000 000 final double shift without 
'000 110 0 add to align result 

cou S00 ! multiplier sign? 
1000 IO 0 

bl,a al 
'000 ier 

sub S02, 5017502. 4 adjust msh product if 
'000 110 multiplier negative 

l:mov %y,%03 ! LAO. +> OS retrieve lsh product 
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! -2 * 3 = -6; 110 -> ol, 011 -> 00 

! o2 Bb TS ALUin ALUout 

mov 200, SY | multiplier -> Y reg 
! 011 

andcc %g0,0,%02 ! clear product accumulator & cc 
!00|0 01/1 O 

mulscce %02,%01,%02 ! 000+110 110 active multiply step 1 
'11]0 OOj1 1 

mulscc %02,%01,%02 ! bE: - OE active multiply step 2 
110]1 00/0 1 

mulscc %02,%01,%02 ! 110+000 110 active multiply step 3 
111 |0 10/0 1 

mulsce %02,0,%02 ! 111+000 111 final double shift without 
a Se eas 010 1 add to align result 

tst S00 ! multiplier sign? 
Va 010 0 

bl,a Ae 
'111 010 

sub %02,%01,%02 ! adjust msh product if 
a 010 multiplier negative 

l:mov %y,%0o3 ! 010 -> 03 retrieve lsh product 

! 3 * ~-2 = -6; 011 -> ol, 110 -> 00 

| o2 Y TS ALUin ALVout 

mov 300, Sy ! multiplier -> Y reg 
! 110 

andcc %g0,0,%02 ! clear product accumulator & cc 
10010 £10" 20 

mulsce %02,%01,%02 ! 000+000 000 active multiply step 1 
!0010 01/1 O 

mulsce %02,%01,%02 ! 000+011 011 active multiply step 2 
NOt 00/1 0 

mulscce %02,%01,%02 ! 001+011 100 active multiply step 3 
'10]0 LAOS. “0 

mulscc 02,0, %02 ! 010+000 010 final double shift without 
!010 010 0 add to align result 

tst 500 : multiplier sign? 
1010 010 1 

bl,a dd 
!010 010 

sub %02,%01,%02 ! 010-011 111 adjust msh product if 
be 010 multiplier negative 

l:mov %y,%03 ! O10). =S~63 retrieve lsh product 


Examples of UNSIGNED multiplication 


110 -> 00 
ALUout 


! 3 * 6 = 
! o2 be 


LBe Obl AS ol, 
TS ALUin 
! multiplier -> Y reg 
: 110 
clear product accumulator & cc 
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'00|0 11/0 0O 





mulscce %02,%01,%02 ! 000+000 000 active multiply step 1 
'00|0 Od pis. oO 

mulscc %02,%01,%02 ! O00+011 011 active multiply step 2 
!O1j|1 OO|1 O 

mulscc %02,%01,%02 ! 001+011 100 active multiply step 3 
!10|0 10|0 O 

mulscec %02,0,%02 ! 010+000 010 final double shift without 
1010 010 0 add to align result 

tst Sol ! msb multiplicand? 
1010 010 0 

bl,a Lt 
1010 010 

add %02,%00,%02 ! adjust msh product if unsigned 
'010 010 multiplicand treated as if 
! negative 

l:mov %y,%03 ! O10.=> 63 retrieve lsh product 

! 6 * 3 = 18; 110 -> ol, 011 -> 00 

! o2 x TS ALUin ALUout 

mov $00, Sy | multiplier -> Y reg 
! 011 

andcc %g0,0,%02 ! clear product accumulator & cc 
100|0 OL. “0 

mulscc %02,%01,%02 ! 000+110 110 active multiply step 1 
111] 0 OO}1- 2 

mulscce %02,%01,%02 ! LITO! - LOT active multiply step 2 
f 0: |} 00|/0 1 

mulscce %02,%01,%02 ! 110+000 110 active multiply step 3 
!11|0 LO:|:0:- 

mulsce %02,0,%02 ! LETEOOO: -dokd final double shift without 
gs iV 010 a; add to align result 

tst Sol ! msb multiplicand? 
EDIE 010 2 

bl,a ue 3 
1111 010 

add OZ; 300,402 ! Lido Ld - O10 adjust msh product if unsigned 
1010 010 multiplicand treated as if 
! negative 

1l:mov %y,%o03 ! 010 -> 03 retrieve lsh product 


5.7.2 Signed Multiplication Using Multiply Step 


/* 
* Procedure to perform a 32-bit by 32-bit signed multiply. 

* Pass the multiplier in %o0, and the multiplicand in %ol. 

* The least significan 32 bits of the result are returned in %00, 

* and the most significant in tol. Multiplies take 47 to 51 instruction cycles. 
x 

" cack): -mul . 

a nop ! (or set up last parameter here) 

xk 

* 


Note that this is a leaf routine; i.e., it calls no other routines and does 
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* all of its work in the out registers. Thus, the usual SAVE and RESTORE 
* instructions are not needed. 
ae 
global .mul 
-mul: mov $00, Sy ! multiplier to Y register 
andcc g0, %g0, %04 ! zero the partial product and clear N and V conditions 
mulscc %04, %o1, %04 ! first iteration of 33 
mulscc %04, %0o1, %04 
mulscc %04, %ol, %04 
mulscc %04, %o1, %04 
mulscce %04, %ol1, %04 
mulscc %04, %o1, %04 ! 32nd iteration 
mulscc %04, %g0, %04 ! last iteration only shifts 


! if 00 (multiplier) was negative, the result is: 


(300 * %ol) Ol * (2**32) 
! We fix that here. 
tst S00 
rd Sy, %o0 
bl,a yD i 
sub S04, Sol, %04 ! bit 33 and up of the product are in 
! $04, so we don't have to shift %ol1 
retl ! leaf-routine return 
mov $04, Sol ! return high bits 


5.7.3 Unsigned Multiplication Using Multiply Step 


/* 
* Procedure to perform a 32-bit by 32-bit unsigned multiply. 
* Pass the multiplier in %00, and the multiplicand in %ol. 
* The least significan 32 bits of the result are returned in %o00, 
* and the most significant in %o1l. Multiplies take 46 or 58 instruction cycles. 
* 
* call -umul 
* nop ! (or set up last parameter here) 
* 
* Note that this is a leaf routine; i.e., it calls no other routines and does 
* all of its work in the out registers. Thus, the usual SAVE and RESTORE 
* instructions are not needed. 
a 4 
-global .umul 
-mul: mov %00, sy ! multiplier to Y register 
andcc %g0, %g0, %04 ! zero the partial product and clear N and V conditions 
mulscc %04, %01, %04 ! first iteration of 33 
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mulscc %04, %ol1, %04 
mulscc %04, %o1, %04 
e 
. 





mulscc %04, %01, %04 
mulscc %04, *%tol, %04 
mulscc %04, %ol, %04 ! 32nd iteration 
mulscc %04, %g0, %04 ! last iteration only shifts 
/* 
* Normally, with the shift and add approach, if both numbers are 
* positive, you get the correct result. With 32-bit two's-complement 
* numbers, -x can be represented as ((2 - (x/ (2**32)) mod 2) * 2**32) 
* To avoid a lot of 2**32's, we just move the radix point up to be 
* just to the left of the sign bit. So: 
* 
* x * y = (xy) mod 2 
* “Kx * y = (2 — x) mod 2 * y= (Zy - xy) mod 2 
as Be mye ge 2 Say) me. a=. a2 =" Bey) mod. «2 
* mae, Sey i ee as ee a ee ye ey mod “2 
* 
* For signed multiplies, we subtract (2**32) * x from the partial 
* product to fix this problem for negative multipliers (see .mul in 
* Section 1. 
* because of the way the shift into the partial product is calculated 
* (N xor V), this term is automatically removed for the multiplicand, 
* so we don't have to adjust 
k 
* But for unsigned multiplies, the high order bit wasn't a sign bit, 
* and the correction is wrong. So for unsigned multiplies where the 
* high order bit is one, we end up with xy - (2%**32) * y. To fix it 
* we add y * (2**32). 
mi 
ESE SO1 
bl,a lf 
add $04, 600, %04 
Big. rd SY, %6O0 ! return least sig. bits of prod 
retl ! leaf-routine return 
mov S04, Sol ! Delay slot; heturn high bits 


5.7.4 Corner Turning Buffer Using Multiply Step 


Multiply Step In Support Of Corner Turning Buffer For Image 
Processing 


The following code fragment shows implementation of an 8 by 8 bit corner turn- 
ing buffer in the local register files. This supports bit plane image rotation by 90 
degrees. The form of the implementation uses register files to hold and manipu- 
late the lowest level of data structure and use data cache to reduce access to the 
larger image plane. The multiply step is used for its ability to couple information 
from one register to another in a single step in a way not expected from its main 
purpose. 
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The total image plane is divided in 8 by 8 bit blocks. Blocks are accessed as groups 
of 4 that rotate into corresponding positions on edges square to each other. These 
form concentric squares. 


Each byte of block loads to Yreg and controls multiply step with constant, 1 in bit 
15, to make local registers 0 to 7 into corner turning buffer. The constant remains 
in a fixed position but the nominal partial product keeps shifting to the right, 
making room for new input. Choosing a large enough constant allows old pro- 
cessed data to remain in the local registers long enough so that it can be extracted 
with shift by a differing amount that depends on which processed byte is desired. 
This allows overlapping of storing results with fetching new input. To accommo- 
date the need for differing shift amounts, casing is used to select one and only one 


instruction out of a block on each pass. A delayed control transfer couple is 


AXATATY Oren = ae Peay 
formed with jump and link immediately followed in the delay slot by branch 


always. The target address of jump and link steps backwards by one instruction 
each pass. As soon as new data is removed from target destination, one eye of 
rotated block is stored there. 


FROM this TO that 
a7 a6 a5 a4 a3 a2 al ad ny of £7 e7- af <7 DT az 
bi? bo bo b4 b3 b2°b1 b0 h6 g6 £6 e6 dé c6 bb abo 
Cl 6 -e5: -e4.°63' eZ: ci.-c0 hS:-g5: £5 65 d5 ¢5 5S as 
ad? d6.db d4- ds. -d2:“di- do h4 g4 £4 e4 d4 c4 b4 a4 
e7 e6 e5 e4 e3 e2 el e0 h3 g3 £3 e3 d3 ¢c3 b3 a3 
£7 £6. £5 £4.23: £2 -41,.°£0 h2 g2 f2 e2 d2 c2 b2 a2 
gi go gd 94. 93-92  g1.-g0 Re Gil. £1 vel. di. cl bl. at 
h7 hé h5 h4 h3 h2 hil ho hO gO £0 eO0 dO cO bO al 


local avVaéa5a4a3a2ala0 input lst byte - ldub 


reg 

0:0 Oa 6, SO Se Soe eX 
Le0y ca0aO os Soe oe Se oR ee 
290 5a ¢ URS Kix URS OS Se a oe 
S20 sy sad (36 ek eee eS 
AO ee 2 OG: 2 VR EK ee 
S032 0a x Re Oe KE 
62-02 Oad: Se -36 oe eo Se 
1? Ow« ee BO RR eR eS 


local b7b6b5b4b3b2b1b0 input 2nd byte - ldub 


reg 

ONO AOD TAT (Se Se ee ae ae 
12065 cba x x KOR Re & oem 
23 UV wes UDOdo Sc 1X SSK Re OK. Oe 
3:0...0b4a4 x x kx xX KX X xX xX 
4:0..,0b3a3 xx & x & kK xX 
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-Ob2a2 x x xX X X X X X 
Oblal «x x * x x * = x 
~Ob0aO x xX X X X¥X XX X 


IHN oO 
Oo Oo Oo 


local c7c6c5c4c3c2cl1lc0O input 3rd byte - ldub 


reg 

O0e see Dial: ROR Res ee oe 
LO. «.0CODGaG:, Ks RX ee Oe 
220s. UC5OD ao. x ee x a Se 
3:0...0C4D4a4 =. x xe wm eR 
4:0...0c3b3a3 x x X X X X KX X 
5tO. Ye OOZ eae. Bo SOS “Sere se ee 
620% e0CLD Lad x x ie a ae oe 
TED 2. OCUD0aEU: Sx OK ok 


* 





k 


x 


local h7hehSh4h3h2hihO input 8th byte - ldub 


reg 
O20...s0hIgiilieladicibial x x ms a x «x <1 
1:0...0h6g6f6e6d6cbb6a6 x X X X X KX KX X 
220%». ONSO5t SebdscSb5oad x x se Mee KK 
3:0...0h4g4f4e4d4c4b4a4 x xX kK X KX KX X X 
4:0...0h3qg3f£3e3d3c3b3a3 x xX X XX KX X XX 
5:0...0h2g2f£2e2d2c2b2a2 x x x xX X X X X 
6:0..:O0hlglfleldiciblal x * x x x x x x 
7:0...0h0g0f0e0d0cOb0a0 x xX xX X X KX KX X 
ATAGA5A4ZA3ZA2A1A0 next edge byte 1 - ldub 
local hig7fVe7d/c/b7a7 output rotated byte 1 - stb <l 
reg 
O20. .0A7N IG iileldiciblay x x «Rx x & 
1:0...0A6h6g6f6e6dbcbb6a6 xX KX X X X KX X <2 
2:0...0A5hn5g5f5e5d5c5b5a5 x x X X X X X 
3:0...0A4h4g4f4e4d4c4b4a4 x x x x xX X X 
4:20.46. 0ASh3 932 3esdsc3b3a3 x x K XR RS 
970. wOAZh2q2E2eZd2C2b24a2 -x% x. x KK 
670....,0ATHhigifiiteldiciblal xx x °x x x x 
7:0...Q0A0hOgO0f0eOd0cOb0a0 x xX X X X KX X 
B7B6B5B4B3B2B1B0 next edge byte 2 - ldub 
local heg6fb6e6db6c6b6a6 output rotated byte 2 - stb <2 
reg 
02020 OBIAIN ig /iveidjeibial «x sx x xk 
1:0...0B6A6h6g6f6ebdE6cbb6aqo & X X X X X 
270. a BOAO NOGO LOC SCSCoD Sad KR Kk Re <3 
3:0...0B4A4n4g4f4e4d4c4b4a4 x x x xX x xX 
4:0 ...0BSA3hSq9S£3e3d3csb3a3° & xk KX Kx 
520%. OBZAZNZGZ2E2e202CZ2b2Za2 xX * Se RR 
6:0...O0BlAlhliglfleldiclblal x x x x x x 
7:0...-O0BOA0hOg0f0eOd0cOb0a0 x xX xX KX x xX 


C7C6C5C4C3C2C1C0 next edge byte 3 - ldub 
h5g5f5e5d5c5b5a5 output rotated byte 3 - stb <3 
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local 
reg 
0:0...0G7F7ET7D7CT7B7ATh7g7£7e7dJc7Tbla7 x 
1:0...0G6F6E6DECEB6AGh6gGb6f6e6db6cbb6a6 x 
250 >..0GSPSESDSCSBSASNS gS tSebdschbsad: x 
3:0...0G4F4E4D4C4B4A4n4g4f4e4d4c4b4a4 x 
320 es <2 UG3F3SESD3C3BSA3h3q3f3e303c3b3a3 x 
5:0...0G2F2E2D2C2B2A2h2g2f2e2d2c2b2a2 x 
6:0...0GLFLE1IDICIBIAlLhiglfleldlicliblal x 
7:0...0GOFORODOCOBOA0DhOg0f0e0d0cOb0a0 x <8 

H7H6HSH4H3H2H1HO next edge byte 8 - ldub 

hOg0f0e0d0cOb0a0 output rotated byte 8 - stb <8 
* 
* 
* 
/* INNER LOOP 0 for each square, position, edge, byte ny 
eu Oe ldub [Si1+%14],%ol 'get input for next pass 
'i1 is base of fetch, controlled elsewhere 
'14 is pointer to target byte 

mulscce %11,%05,%11 'finish corner turning with previous input 

mulscc %10, %05, %10 'garbage lst time, reg 05 = 2%15 

sra 14,4, %14 'downshift adrs pointer for extract pointer 

mov SO1, SY 'new input 

jmpl 6g1+%Si4,%g0 !for i=7->0 

ba LZ 'select 1 extract result instruction 


'only one srl %1x,z,%00 done on each pass 
‘use of casing keeps code compact while still avoiding self modifying code 
!'gl points to tl 


srl $10,8,%00 
srl 611,7,%00 
srl $12,6, 500 
srl $13,5,%00 
srl $14,4,%00 
srl 6 Log COU 
srl $16,2,%600 
tis srl $17,1,%00 
pe A sll 14,4, %14 'upshift extract pointer for adrs offset 
stb 600, [$10+%14 ] 'store 1 result 
'130 is base of store, controlled elsewhere 
'i30 = i1 3 times out of 4 
mulsce %$17,%05, %17 'start corner turning with new input 
mulsce %16,%05,%16 
mulsce $15,%05,%15 
mulscce %14,%05,%14 
mulsce %$13,%05,%13 
mulsce 12,%05,%12 
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'dec adrs offset 


'set N & V =0 
‘keep left input to multiply partial 
!product zero 


This less obvious use of multiply step and less common use of delayed control 
transfer couple allow efficient implementation of a fast corner turning buffer to 
support bit plane image processing. 
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The MB86930 SPARClite microcontroller is suitable for a wide range of embedded 
controller applications due to its high performance and low unit cost. In design- 
ing a system, several issues and trade-offs must be considered to balance the 
needs of performance, low hardware cost, low development cost, and short time 
to market. This chapter provides detailed information on some specific design 
considerations: 


The clock signals and type of clock source 


The sizes, types, and interface requirements of the system memory and 


peripherals 


The possible need for DMA capability and bus arbitration 


The possible use of an MB86940 Peripheral Chip for interrupt control, timers, 


and USARTs 
In-circuit emulation capability 
Other hardware implementation issues 
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6.1 Clocks 


Fither of two possible clock sources can be used to drive a SPARClite system: the 
internal oscillator of the MB86930 processor, or a separate external oscillator. In 
the former case, a crystal is connected across inputs XTALI1 and XTAL2. In the 
latter case, the clock signal is connected to the XTAL1 input pin; XTAL2 is left 
unconnected. Using the internal oscillator has a lower hardware cost, but is less 
flexible than using an external oscillator. 


There are two clock output signals from the processor, CLKOUT1 and CLKOUT2. 
CLKOUT1 has the same frequency and phase as the internal oscillator or the 
signal applied to XTAL1. CLKOUT2 is the same as CLKOUT1, but phase-shifted 
180 degrees. The rising edge of either CLKOUT1 or CLKOUT2 can be used by the 
external system for timing purposes. 


The output clocks are controlled by a phase-locked loop implemented in the pro- 
cessor. The phase-locked loop minimizes the skew between the input clock signal 
and CLKOUT1, and controls the duty cycles of the output clocks. The input clock 
signal applied to XTALI can have a relatively wide range of duty cycles. (See the 
data sheet for the clock timing specifications.) The duty cycle of the output clocks 
is somewhat less than 50%, reflecting the fact that the processor requires its inter- 
nal clock phases to have non-overlapping transitions. 


The drive capability of the clock output signals is limited. Depending on the 
number of inputs that must be driven and the clock speed, it may be necessary to 
buffer these signals for use elsewhere in the system. To minimize clock skew for 
systems that exceed the drive capability of CLKOUT1 or CLKOUT2, a buffered 
external clock can be used to drive both the processor and the system. 


6.2 Memory and I/O Interfacing 


The SPARClite processor minimizes the need for external logic by providing a 
programmable on-chip address decoder and six independent chip-select output 
signals. The address decoder compares the current address against the pro- 
grammed address ranges, and automatically asserts the appropriate chip-select 
signal. The on-chip address decoder is more economical than a separate external 
decoder, and also operates faster. 


Each programmable address range has an associated wait-state generator, which 
generates a Ready signal internally at a programmed number of access cycles. 
Fither this internal Ready signal can be used, or the conventional -READY signal 
input from the external memory controller can be used to end the transaction. The 
processor can also be programmed to use the internal wait-state generator, while 
allowing the -READY signal to override the internal count to end the bus cycle 
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sooner. The internally generated Ready signal is not visible external to the 
processor. 


If you use a single chip-select signal from the processor to select multiple memory 
or I/O devices, all those devices will have the same number of wait states gener- 
ated when they are accessed. Different chip select signals, however, can be indi- 
vidually programmed to different numbers of wait states. 


Any area of memory not mapped to one of the chip selects (-CS5-0) will use the 
external -READY. 


6.2.1 Interfacing SRAM 


The address bus, data bus, and chip select signals of the SRAM can be connected 
directly to the address bus, data bus and a chip select of the processor. The output 
enable signal can be generated by gating RD/—WR high and Chip select low to 
produce output enable low. Write enable for the SRAMs requires more consider- 
ation. 


The processor data hold time for a write is specified as zero hold after rising edge 
of clock. RD/—WR hold time at the end of a write operation can be 0 after rising 
edge of clock, or can be held low if the next cycle is also a write. Thus an imple- 
mentation cannot use RD/—WR directly as -WE for the SRAMs. 


Figure 6-1 shows a timing diagram for an example implementation using 2 cycle 
access SRAM running at 40 MHz. It was implemented in a combinatorial PAL 
(see Figure 6-4). Individual -WE signals are generated for each of the 4 bytes in 
the data word. 


CLK P1 





Figure 6-1. SRAM Interfacing Example 
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lclkd = !clkpl; 


Ps0e = Hwee lscs 7 

iswe3) = rw @ Tas’ «& tbe3 «& Iclkpl 
# !'rw & !as  & !be3  & !clkd 
# !rw & !scs_ & !swe3  & clkpl 
# !rw & !scs_ & !swe3  & clkd; 

‘swe2..-= rw @: las. 6 1beZ. & Lelkpl 
# !rw & las  & !be2  & !clkd 
# Irw & !scs  & !swe2_ & clkpl 
# !rw & !scs_ & !swe2  & clkd; 

iswel. = !rw @ tas &€ [bel © € !clkpl 
# Irw & !as  & !bel & !clkd 
# trw & !scs  & iswel_ & clkpi 
# !rw & !scs & !swel_ & clkd; 

Vsweo”. = Prw€- tas. 6 1be0: - 6: telkpl 
# !rw & 'as & !beO & !clkd 
# !rw & !scs_ & !sweO & clkpl 
# !rw & !scs_ & !swe0 & clkd; 


Clock low and —AS low and —BE low and RD/—WR low cause —WE to be asserted. 
Clock high and —CS low and —BE low and RD/—WR low cause —WE to stay low. 
When clock goes low again, -WE is negated. This way there is sufficient data hold 
time. 


For this implementation, CLKOUT1 from the processor was used since it has 
better duty cycle control than an oscillator clock. 


6.2.2 Interfacing Page-Mode DRAM 


Interfacing Dynamic RAM requires a DRAM controller for generating RAS and 
CAS (Row Address Strobe and Column Address Strobe), and for handling 
refresh. The DRAM controller is typically implemented as a state machine. The 
DRAM controller and signal interfaces should be designed carefully to accommo- 
date refresh operations and fast page mode access. 


The programmable 16-bit timer provided in the SPARClite processor can be used 
for timing the refresh interval. The timer output signal, -TIMER_OVF (Timer 
Overflow), goes low for a single clock cycle at the end of each timer interval. The 
timer interval is programmed in software, the correct amount of time depending 
on how the refresh operation is implemented. 
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There are two ways to implement the correct number of wait states: either the 
processor’s internal wait-state generator can be used, or the DRAM controller can 
generate a -READY signal for the processor. 


The processor supports fast “page mode” access to DRAM. When the current 
DRAM address is within the same page as the previous DRAM access, the 
—SAME_PAGE (Same-Page Detect) signal is asserted. This tells the DRAM con- 
troller that DRAM can be accessed using CAS only, without selecting a new row 
of the DRAM, saving time. Page-mode accesses thus provide timing advantages 
comparable to the burst-mode accesses of some other processors. 


To take advantage of page hits, RAS is asserted and left asserted to continuously 
select a row. CAS is asserted, one access at a time, to select a memory location in 
that row. Accesses need not be in consecutive locations. As long as each access is 
in the same row, RAS can be left asserted and CAS asserted once to access each 
memory location. RAS remains asserted between accesses. 





The wait-state generator can be programmed to use a different (smaller) number 
of clock cycles for a “page hit” (when the current address is within the same page 
as the previous DRAM access). 


When using the internal wait-state generator instead of the external -READY 
signal, the processor has no way of detecting a refresh operation that occurs dur- 
ing an access. One solution is to have the DRAM controller take control of the bus 
during refresh using -BREQO (Bus Request), thereby preventing the processor 
from requesting a memory access for the duration of the refresh operation. The 
disadvantage of this solution is that the processor is forced to remain idle. An 
alternative solution is to disable the internal wait-state generator and let the 
DRAM controller generate the -READY signal for all DRAM accesses. 


Figure 6-2 is a simplified state diagram for a DRAM memory controller. Upon 
reset, the state machine starts in the RAS Precharge and Idle state, and remains in 
that state until a memory access or refresh request occurs. 

















Refresh 
Request 









New-Page Access ; 
or Refresh Request / Page Wait: 
RAS asserted 
CAS negated 





RAS 
Precharge 
and Idle 









Access Same_Page Access 





Note: Each state may represent 
multiple clock cycles 


Figure 6-2. Simplified State Diagram for DRAM Controller 
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If a refresh request occurs, the state machine goes into the Refresh state. (In prac- 
tice, this will actually be a number of sequential states.) When the refresh opera- 
tion is complete, the state machine returns to the RAS Precharge and Idle state. 


When the processor requests a DRAM memory access, the state machine enters 
the RAS state, in which the RAS signal is asserted to select the row. From there it 
goes to the CAS state, in which the CAS signal is asserted to select the column. At 
this point, data is clocked into the appropriate part and the bus cycle ends. 


From there the state machine enters the Page Wait state, in which the state 
machine waits for something to happen; either another memory access or a 
refresh request. In this state, RAS is asserted and CAS is negated. If there is a 
memory access to the same page of DRAM (as indicated by the -SAME_ PAGE 
signal), the state machine goes directly to the CAS state, and CAS is asserted to 
select the memory location. if there is a memory access to a different page of 
DRAM, or if a refresh request occurs, the state machine goes to the RAS Precharge 
and Idle state, and from there to the requested operation. Until one of these events 
occurs, the state machine waits with RAS asserted. 


For more information, refer to SPARClite Application Note #1 on DRAM 
interfacing. 


6.2.3 Interfacing EPROM and Other Devices with Slow 
Turn-off 


One characteristic of EPROM memory to consider is its relatively long turn-off 
time—the delay from the negation of the Chip Select input or Output Enable 
input to the three-stating of the data outputs. In high-speed systems, contention 
on the data bus between different peripheral devices can occur, depending on the 
organization of different memory and peripherals in the system. 


When using EPROM in the system (or other memory or I/O devices that are slow 
to turn off), carefully study the timing diagrams in the External Interface chapter 
of this manual and in the data sheet, and determine the worst-case access situa- 
tions. If contention on the data bus can occur, consider adding fast data buffers 
between the EPROM outputs and the system data bus. These data buffers will 
allow the EPROM outputs to be quickly isolated from the data bus at the end of 
an EPROM access cycle. 


The worst-case timing situation typically involves two consecutive loads from 
different devices. In back-to-back loads from different devices, there must be 
sufficient time for the first device to get off the data bus before the second device 
tries to drive its data. A load followed by a store is not critical since the processor 
inserts a “dead cycle” in this sequence to allow the external device to fully relin- 


quish the bus. 
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6.2.4 Illegal Memory Accesses 


The external memory or I/O interface circuit can detect illegal memory accesses 
and prevent the processor from completing such accesses by asserting the -MEXC 
(Memory Exception) and -READY signals. (See Figure 4-2, Load with Exception 
Timing, and Figure 4-4, Store with Exception Timing.) The current bus access is 
invalidated by the assertion of this signal, and the processor ignores the value on 
the data bus in that cycle. An instruction-access or data-access exception trap is 
initiated in the processor, allowing the software to handle the illegal memory 
access. 


The memory-exception mechanism can be used for protection, by preventing 
user-mode accesses to certain regions of the processor’s address space. External 
logic can also be used to detect and signal out-of-range access attempts. 


6.2.5 I/O Interfacing Example: Ethernet Device 


As an example of an I/O device interface, consider the MB86960 Ethernet inter- 
face device, also known as the NICE™ chip, used on the SPARClite Evaluation 
Board. In the evaluation board implementation, a PAL and two data transceivers 
are used to handle the interface. A block diagram of the interface is shown in 
Figure 6-3. 


Data Transceivers 








MB86930 | MB86960 
SPARClite Ethernet Device 


Processor 





Figure 6-3. MB86960 Interface Block Diagram 


The MB86960 NICE chip is completely asynchronous, has a non-deterministic 
access time, and has a long turn-off delay for the data pins. The PAL handles the 
synchronization of the control signals (Read, Write, Chip-Select, and Ready) 
between the processor and the NICE chip. The two data transceivers are used to 
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isolate the output pins from the data bus when a data access is complete. 
Figure 6-4 is a state diagram for the PAL. 


IReset_ 






else 











les_ & !rd/wr Ics_ & rd/wr 


else 7 


Isnrdy_ Isnrdy_ 


RREADY 
Ready_=0 
nrd_=0 


ISnrdy_ := !Inice_ready_ 


Figure 6-4. MB86960 Interface PAL State Diagram 


Read and write operations are strobed by the assertion of the signals N_RD and 
N_WR (the read and write input pins of the NICE chip). To ensure that the 
address and the NICE chip Select signals are stable during strobing, the state 
machine waits one clock cycle before asserting N_RD or N_WR. When a transac- 
tion is finished, the NICE chip asserts its N_READY signal. Since N_READY is 
asynchronous, it is synchronized by a flip-flop in the PAL, producing a synchro- 
nized ready signal, which can then be used elsewhere inside the PAL and by the 
processor. 


In a write operation, the synchronized Ready signal causes N_WR to be negated 
and the processor’s -READY signal to be asserted. The data input setup and hold 
times of the NICE chip are based on the transition of the N_WR signal from 
asserted to negated; early negation ensures that there will be enough hold time 
because the processor won’t stop driving the data bus until the next clock cycle. 


In a read operation, the synchronized Ready signal causes the processor’s 
—~READY signal to be asserted, and on the next cycle, the -READY signal and 
N_RD are negated. Since data setup and hold times of the processor are based on 
the rising edge of the clock while -READY is asserted, enough hold time is 
ensured. The setup time requirement is ensured because there are almost two 
clock cycles between N_READY and the processor sampling the data. 


In the case of back-to-back reads of the NICE chip, a new cycle can’t start until 
N_READY is negated from the previous cycle. 
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The data transceivers are enabled by —CS asserted and —AS negated. Thus, during 
the uncertain period at the beginning of a bus cycle, the transceivers are not driv- 
ing the data bus. 


The byte order for the NICE chip (little-endian) is opposite that of the SPARClite 
processor (big-endian). The byte order is swapped in hardware: SPARClite data 
bits 8-15 connect to NICE bits 0-7, and SPARClite data bits 0-7 connect to NICE 
bits 8-15. The NICE chip can operate in both 8-bit and 16-bit modes. 


6.3 DMA and Bus Arbitration 


Some systems require support for multiple bus masters, such as for DMA (Direct 
Memory Access). An external device requests control of the bus by asserting the 
—BREQ (Bus Request) signal. External bus requests take precedence over internal 
requests. The processor, upon completing the current bus transaction, three-states 
its bus drivers and asserts -BGRNT (Bus Grant) to indicate that it is relinquishing 
control of the bus. The external device then takes control of the bus. 





Upon completion of the DMA transfer or other bus operation, the external device 
de-asserts the -BREQ signal. The processor responds by de-asserting the -BGRNT 
signal and taking control of the bus, continuing with the next processor transac- 
tion. 


The chip-select logic of the processor does not monitor the address bus and does 
not operate during the time that the bus is granted to another bus master. There- 
fore, an external address decoder should be used to generate the chip select sig- 
nals for the external bus master. Also, the -CS outputs of the processor are held 
high (negated), but not three-stated, while the bus is granted to the external bus 
master. Therefore, for each memory device that is to be accessed by the external 
bus master, an OR gate must be provided at the chip select input to accept the 
signal from either the processor or the external address decoder. An alternative 
method is to not use the —CS signals from the processor at all, and to use the exter- 
nal address decoder all of the time (although the propagation delay for on-board 
chip selects is less). 


A DMA operation that writes to system memory must be designed in sucha 
manner that it will not modify cached data. Otherwise, the external memory data 
would no longer match the data stored in the processor’s cache, resulting in 
errors. One way to meet this requirement is to locate the DMA-accessed memory 
in an address space that is not cached. The only address spaces that are cached are 
the User/Supervisor Instruction and Data spaces, corresponding to ASI (Address 
Space Identifier) values 0x8, 0x9, OxA, and OxB. Locating the DMA-accessible 
memory only in other address spaces (i.e., ASI values 0x10-OxFE) will ensure that 
no cached data will be modified. 
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Another way to handle this requirement is to use software to invalidate the data 
stored in cache when the external memory is modified. The software must keep 
track of what is cached and what is being modified. Each time a cached memory 
space is modified, the software invalidates the corresponding data stored in 
cache, in effect forcing an update to the cache whenever its contents are out-of- 
date. 


Alternatively, embedded control task monitor software can be used to control the 
dynamic assignment of buffers between DMA inputs and outputs and processing 
inputs and outputs. The software can then ensure that no DMA transfers involve 
currently cached memory. 


6.4 MB8&6940 Peripheral Chip 


The MB86940 is an optional peripheral device that interfaces directly with the 
MB86930 SPARClite processor, and operates at the same clock speeds. It provides 
a variety of support features; a 15-level interrupt controller, a set of four counter/ 
timers, and a set of two USARTs. With a MB86940 Peripheral Chip in the system, 
you can use any or all of these support features. The Peripheral Chip is a low- 
power CMOS device in either 120-pin PQFP or 135-pin CPGA packages. 


A brief overview of the Peripheral Chip features is provided below. For detailed 
information on the chip functions, interfacing, and specifications, refer to the 
MB86940 User’s Guide. 


6.4.1 Interrupt Control 


The interrupt controller on the Peripheral Chip has 15 separate interrupt-request 
inputs. The trigger conditions and active signal levels are individually program- 
mable. The interrupt controller arbitrates the pending requests, and based on the 
SPARClite priority levels, issues an asynchronous interrupt to the processor. The 
interrupt is held pending until acknowledged by the processor. 


The SPARClite processor has four interrupt inputs, (IRL3-IRLO). The value on 
these pins defines the level of the external interrupt. The value 0000 indicates no 
pending interrupt, while 1111 forces a non-maskable interrupt. Intermediate 
values indicate maskable interrupts with the corresponding priority levels. 
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6.4.2 Counter/Timers 


The Peripheral Chip has four general-purpose 16-bit counter/timers. Each timer 
can be individually programmed to operate in any of several modes: time-out 
interrupt mode, rate generation mode, square wave generation mode, external- 
trigger one-shot mode, and software-trigger one-shot mode. Each timer can be 
reloaded at any time. Two prescalers are provided to optionally reduce the oper- 
ating frequency of the timers. 


6.4.3 USARTs 


Two USART (Universal Synchronous/ Asynchronous Receiver/Transmitter) 
channels are provided in the Peripheral Chip. The channels are individually pro- 
grammable. Each channel is capable of sending and receiving serial data at rates 
up to 64K baud in synchronous mode and up to 19.2K baud in asynchronous 
mode. Data can be five to eight bits per character. 





6.5 In-Circuit Emulation 


SPARClite processors have ten pins used for in-circuit emulation: four emulator 
status/data bits, four emulator data bits, an emulator break request line, and an 
emulator enable pin. All of these pins should be left unconnected in the design for 
proper system operation. 


To allow for compatibility with an in-circuit emulator, the system’s reset circuit 
should be designed to allow the in-circuit emulator to take control of the -RESET 
signal. For example, a jumper in the -RESET input line close to the processor can 
be included, allowing the normal Reset circuit to be easily disconnected from the 
processor. 


To simplify the task of emulating the processor especially for boards that do not 
socket the processor, it is recommended that the processor’s emulator pins be 
connected to a standard format 20-pin connector. Access to these pins allow the 
emulator to take full control of the processor as well as to trace processor activity. 
If this socket is included on production boards, an emulator can be used for board 
diagnostics and maintenance later in the product life cycle. For more information 
contact Fujitsu Microelectronics’ Advanced Products Division or your emulator 
vendor. 
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6.6 Physical Design Issues 


Multiple VCC and VSS pins are provided on the SPARClite device for power and 
ground connections. The circuit board should be designed using separate power 
and ground planes for power distribution. Every VCC pin must be connected to 

the power plane, and every VSS pin must be connected to the ground plane. Any 
pins identified in the data sheet as “NC” must be left unconnected in the system. 


To minimize the effects of spikes on output transitions, a generous amount of 
decoupling capacitance should be connected near the MB86930 device. It is 
important to use low-inductance capacitors and interconnections, especially in 
high-speed systems. Inductance can be minimized by making the board traces as 
short as possible between the processor and the decoupling capacitors. 


lor reliable operation, alternate bus masters must drive any signals that are three- 
stated by the processor when the processor grants control of the bus. Among the 
signals that must be driven are -LOCK, ADR31 through ADR2, ASI7 through 
ASI0, -BE3 through —-BEO, -AS, and RD/—WR. These pins are normally driven by 
the processor during active and idle bus states, and don’t require external 
pullups. D31 through D0 should be pulled up. 


When designing the system, take into account the amount of load on the signal 
lines driven by the processor. The standard load is specified in the data sheet. If 
the actual load in the system is larger, the system may not be able to operate at the 
speeds specified in the data sheet timing diagrams, making it necessary to use a 


slower clock or to use buffers for the heavily loaded signals. 
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Instruction Set 


This chapter presents the SPARClite processor instruction set. Sections discussing 
recommended assembly language syntax, a table of instructions listed by opcode, 
and an alphabetized instruction set reference are included. 


7.1 Suggested Assembly Language Syntax 


This section provides guidelines that describe the typical SPARC syntax accepted 
by most SPARC assemblers. It is intended to be a guide to help in understanding 
the code examples shown throughout this manual. Consult your assembler man- 
ual for a compete syntax description. 
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7.1.1 Register Names 


reg A reg is an integer register name!. It can have one of the following values: 

cok aa 0 rma 8 aro ib 

%g0 ... %Q7 (global registers;sameas %r0 Sr7 ) 

260) 4. S07 (out registers; same as %r8 $r15) 

SLO ge 317 (local registers: sameas %r16 ... $r23) 

$10: 0 SL (in registers: sameas %r24 .. $r31) 

Sfp (frame pointer, conventionally same as %i6) 

sp (stack pointer, conventionally same as %06) 


Subscripts further identify the placement of the operand in the binary 
instruction as one of the following: 


EQ ys] (rs1 field) 
C8159 (rs2 field) 
eS rd (rd field) 


asr_reg _Anasr_reg is an Ancillary State Register name”. It can have one of the following 
values: 


Sasrl .. %asr31 


Subscripts further identify the placement of the operand in the binary 
instruction as one of the following: 


AST_TES <1 (rs1 field) 
AST_TEQ rq (rd field) 


7.1.2 Special Symbol Names 


The symbol names and the registers or operators to which they refer are as 


follows: 
Spsr Processor State Register 
Swim Window Invalid mask Register 
Stbr Trap Base Register 
SY Y register 
Shi Unary operator which extracts high 22 bits of its operand 
Slo Unary operator which extracts low 10 bits of its operand 


1. In actual usage, the sp, 3fp, gn, ton, $1n and %in forms are preferred over rn 
2. The MB86930 allows only %asr17. 
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7.1.3 Values 

Some instructions use operands comprising values as follows: 
simm13 A signed immediate constant that can be represented in 13 bits 
const22 A constant that can be represented in 22 bits 
asi An alternate address space identifier (0 to 255) 


7.1.4 Labels 


A label is a sequence of characters comprised of alphabetic letters (a-z, A-Z {upper 
and lower case distinct]), underscores (_), dollar signs ($), periods (.), and decimal 
digits (0-9). A label may contain decimal digits, but cannot begin with one. 


7.1.5 Comments 


Two types of comments are accepted by most SPARC assemblers: C-style 
“7/*...*/” comments (which may span multiple lines), and “!...” comments, which 
extend from the “!” to the end of the line. 





7.2 Syntax Design 
The suggested SPARC assembly language syntax is designed so that: 


e The destination operand (if any) is consistently specified as the last (right- 
most) operand in an assembly language statement. 


e A reference to the contents of a memory location (in a Load, Store, or SWAP 
instruction is always indicated by square brackets ([]). A reference to the 
address of a memory location (such as in a JMPL, CALL, or SETHI) is specified 
directly, without square brackets. 


7.3 Synthetic Instructions 


Table 7-1 describes the mapping of a set of synthetic (or “pseudo”) instructions to 
actual SPARC instructions. These synthetic instructions may be provided in a 
SPARC assembler for the convenience of assembly language programmers. 


Note that synthetic instructions should not be confused with “pseudo-ops”, 
which typically provide information to the assembler but do not generate instruc- 
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tions. Synthetic instructions always generate instructions; they provide more 
mnemonic syntax for standard SPARC instructions. 


Table 7-1: Mapping of Synthetic Instructions to SPARC Instructions 


Synthetic Instruction SPARC Instruction(s) 


cmp 6EQrs11 TCEGrs2 subcc 6EQrs1, TEQrsa, 5GO compare 
cmp re9;s1, simm13 subcc 69751, SIMM13, gO 












jmp FEOrs1 + TCOrs2 jmpl FEGrsy + FEGrsa, 590 
jmp re9;s1 +/- simm13 jmpl 69757 +/- simm13, 3g0 
call 69751 + TCEQrs2 jmpl FEGrs14 + FEGrs0, S07 
call reg;57 +/- simm13 jmpl re9rs1 +/- simm13, 307 


peu reQrs2 OrGe 690, FEGrso, G0 


ret jmpl Si. ep equ 
607+8, SgO 











return from subroutine 


return from leaf subroutine 





















trivial restore 
trivial save 

(Warning: trivial save should 
only be used in kernel code!) 


(when ((value&Ox1 fff) == 0)) 


restore 
save 


restore 
save 
















set value, reG,q Shi(value), regrg 



















or 

or %g0, value, regrq (when -4096 < value < 4095) 
or 

sethi Shi(value), regrq (otherwise) 

or reg q s1lo(value), regrg 


Warning: do not use set in the 
delay slot of a DCTI. 


not E9754, TCOrg xnor FEGrs4, 590, [EQrg one’s complement 
not reOrag xnor rEQrg 290, FEOrg one’s complement 
neg lCG9rs1s 'C9ra sub SgG0, FEGrs9, TE9rg two’s complement 

reGrg 3g0, FEO TCOra two’s complement 





































inc reGrq add rEG ra 1; E9ra increment by 1 
inc simm13, re9,q add re9,q, simm13, regrq increment by const13 
LCE reGrg addcc rEG ra 1, FEOrg increment by 1 and set icc 


inccec simm13, reQrq regrq, SiIMM13, TEGrq increment by const13 and set icc 


























dec reOr¢ sub rEGrqa 1, FE9rg decrement by 1 

dec simm13, erg sub regrq, SIMM13, FEGrq decrement by const13 

deccce reGrq subcc rCGrqg 1, TCOrg decrement by 1 and set icc 
decce simm13, reQra reGrq, simmM13, FeGrq decrement by const13 and set icc 


























re9r51 + TCG;s2 andcc FEQr51 + TEGrsa, G0 bit test 
bust reG;s57 +/- simm13 andcc re9r51 +/- simm13, 3g0 bit test 
bset reGrs1 + TCEGrs2 or FEQrs1 + FEQ;s2, 590 bit set 
bset reg,s1 +/- simm13 or re9751 +/- simmM13, 3g0 bit set 
belr E9751 + TEGrs2 andn FE9rs1 + FEQrsa, G0 bit clear 
belx reg,s1 +/- simm13 andn F951 +/- simm13, 3g0 bit clear 

FEOrs1 + TCEOrs2 FEQrs1 + TEQrs2, 390 bit toggle 

re9;5, +/- simm13 FCG ;57 +/- simm13, %g0 bit toggle 
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Table 7-1: Mapping of Synthetic Instructions to SPARC Instructions 


Synthetic Instruction SPARC Instruction(s) 


rE Ord g0, %g90, FEDrg clear (zero) register 
[rE9r51 + TCEQrsa] $g0, [reQrs1 + FEGrso] clear byte 

[reg;s1 +/- simm13] $90, [reg;.; +/- simm13] clear byte 

[re9rs1 + TCEQ;sal %g0, [FEQ;51 + FEQrsal clear halfword 
[reg,s1 +/- simm13] $90, [reg,.1 +/- simm13] clear halfword 


[re9rs1 + TE9rsa] G0, [FE Qrs4 + FEQrsal clear word 
clear word 


[réeg,s7 +/- simm13] 


FEO rst, TEGrq 

reG;57 +/- SIMNM13, FEGrg 
SY; reg rq 

sasrn, FeQrg 
Spsr, [CGrgq 

Swim, [CQrq 

tbr, FeGrq 

rEQrs1, 3Y 

simm13, Sy 

rCGrsz, SASK reg 
simm13, sasr_reg 
rEGrs4, %Psr 
simm13, psr 
r@Qrs4, owim 
simm13, swim 
rCQrsj, Stbr 
simm13, tbr 


5g0, [rEG;.7 +/- simm13] 


6GO, FEQrs74, TEOrg 


3g0, FE9;57+/- SIMM13, reGyq 


SY; rE Ord 

sasrn, feQrg 
SpSr, FEQrg 
Swim, FEQrq 

tbr, regrg 

FEO rst, 3Y 
simm13, xy 
rE9y54, 6aSYr reg 
simm13, sasr_reg 
r@CGrs1, SPSL 
simm13, spsxr 
r@Grsj, swim 
simm13, wim 
rE9 rs; Stbr 
simm13, tbr 
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7.4 Binary Opcodes 


Table 7-2: 





The following table provides a mapping by binary opcode of the SPARC instruc- 
tions mnemonics. In the table, the 32-bits that make up an instruction are divided 
into 4 fields. Field 1 for bits 31-30, field 2 for bits 24-19, field 3 for bits 29-25, and 
field 4 for bits 13-5. When using the table, look first for a match in field 1, then a 
match in field 2, followed by fields 3 and 4 until the desired mnemonic is found. 


SPARC Instructions Sorted by Opcode 

















instruction Mnemonic 













[00 maxx | 000xae | axmaccox [UNMET 
[00 [0000 [02x | sscccmncex | BN 
O00 x0001 OLOxxx XXXXXXXXX | BE 
: 
B 
BN 


1011 G 
«1100 


G 


x1010 


[00 z 

00 B 

00 2 

Ered : 
eae 

00" | vs 

004 A 

00 z 

[00 B 
=o BGE 

00) BGO 

[00 xttt0 [01 0xxx | soccccxxax | BPOS 

P00 Pitti [01 0xxx | mccoxxxx [pve SS 
[00 xc | 100xxx | rocccoaxx | SETA 
[00 [00000 [1 00xxx | xxxxxaxxx [NOP 
[00 [0000 [ti 0xxx | maccxxx | PBN SSS 
[00 x0001 [11 0xxx | socccoxxx | FBNE SSCs 
[00 x0010 [11 0xxx | occcoxx | FBLGSSOSCSCS—SCS 
[00 x001t [11 0xxx | xxxxxxxxx | PBUL 
00) x0100 [dt Oxee | secncoaax [PBL —SSSOSOS—S 
00 x0t01 [ti dxxx | xxxxxxxxx | FBUG 
[00 x0it0 [11 0xxx | xxacxxxxx [FBG 
P00 [ott [11 0xxx | xxxxxxxxx | FBO 
[00 [1000 | 110xxx | xxxxxxxxx | FBA 
[00 [1001 [11 0xxx | xxxxxxxxx | FBE 
[00 [1010 [11 0xxx | xxxxxxxxx | FBUE 
[00 fot [ti dxxx | xnxx | FBGE SSS 
00x00 [tt 0xxx | xxxxxxxxx | FBUGE 
[00 xit01 [11 0xxx | socccxxxe | FBLE OSS 
[00 xtit0 P11 0xx 
[00 [xt | tidxxx | socccaxax [FBO SS 
[00 fx0001 | tiixxx | anccxaxe | cB123.—OSSCSCSCSC—~S~CSCS 
[00 x0010 | titxxx | mmcxxxxxx [cp12—SSSS—S—S—S—S~S—Ss 


F 
F 
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Table 7-2: SPARC Instructions Sorted by Opcode (Continued) 










x0011 
x0100 
xO0101 
x0110 
xO LIT 
x1000 
x TOOL 
x1010 
x LOL 
x1100 
sol Od 
pi inl lg 8) 
> el Wd 
O1xxx 
XXXXX 


an) 
O1kF 


4 
© 


ARAKR 


fA 
© 


AXXRK 
ARAXX 
ARXKKAR 
ARK XX 
AXXKA 
AXXXX 
AXXXX 
AXXKX 
AXAXX 
AXKXKXA 
XXXXK 
AXXXX 
AXKXXX 
AXXXX 


10 
10 
LO 
10 
10 
10 
10 
10 
a) 
10 
10 
10 
ce) 
10 
0 


AXXAXX 
RAXKX 
AXAXX 







001011 
901100 


001110 


010011 





Instruction Mnemonic 
CBI13 
CB1 
CB2Z3 


Q 


CBS 
CBA 
CBO 
CBO03 
CBO2 
CBZ 3 
CBU s 
CBOL3 
CBO12 
CALL 
ADD 
AND 
OR 
XOR 
SUB 
ANDN 
ORN 
xXNOR 
ADDx 
UMUL 
SMUL 
SUBx 
UDIV 
SDIV 
ADDcc 
ANDcc 
ORcc 
XORcc 
SUBCE 








ANDNcc 
ORNcc 

XNORCC 
ADDxcc 
UMULcc 
SMULcc 
SUBxcc 
DIVoce 
UDIVGSG 
sDIVec 
TADDcc 


AXXKX 
XXXXX 
ARAARRAN 





XXXXX 
AAXRRXR 
ARXARRAX 
ARXAKX 
AXAKAX 
XAXXAXX 
ARAKA 
XXKMKK 


RAKRKX 


XAXAAKA 





RAKRRA 


i 


100001 
100010 
100011 


TSUBCGE 
TADDCeTYV 
PoOUBCETY 
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Table 7-2: SPARC Instructions Sorted by Opcode (Continued) 











Instruction Mnemonic 


SLL 





RDPSR 
RDWIM 
SCAN 






110000 


WRASR 








00000 110000 XXXXXXXXX 


eae 
Ce 
anes 
fee 
Eat 
eset 
ieee! 
eae 
anneal 
ee 
10 WRY - a 
[20 [xxx | 110001 | ssoccxexx [WRPSRSSCSCS~dSO 
[10 xxxxx [110010 | acccxxxxx [WRI SSSidO 
[10 xxxxx [110011 | xxxxxxxxx [WRTBR™ SSS 
[to xxxxx [110100 | 011000111 [ ravos. SSS 
[10 | _xxxxx [110100 | 001101001 [remuud SCS 
10 xxxxx [110100 | 001101110 [remuug——SCSCSCS—SS 
[10 | xxxxx [110100 | 011001100 [ritog SS 
[10 | _xxxxx | 110100 | 011010010 [ravoi ~——SCSC—C~—“C“‘“‘<;732PS 
[10 _xxxxx | 110100 | 011010011 [rqroi_——SSC—C—“—~“s‘“‘*‘“<‘iSS 
[10 | _xxxxx [110100 | 011010001 [stoi —SCSCSCS—S 
011001101 | FsTog——SS—~—SCS 
901000010 | Fapba——SS—S—~S 
[10 | _xxxxx | 110100 | 001000011 [ FaDbg SS 
[10 | _xxxxx | 110100] 000101001 [ rsorrs SS 
[10 | _xxxxx | 110100 | 001000101 | rsuBs_——S—S 
901001010 | FuLa cine 
[10 | _xxxxx | 110100] 001000110 | rsuga—S—S—S 
901001011 | rmuid———SSCSC~—S 
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Table 7-2: SPARC Instructions Sorted by Opcode (Continued) 



















Instruction Mnemonic 
XXXXX 


110100 FMULG 
110100 FMULS 
110100 FSUBQ 
110101 FCMPEq 
110101 FCMPs 
110101 FCMPq 
110101 FCMPEd 
110101 FCMPES 
110101 FCMPd 
110110 CPop1l 
110111 CPop2 
111000 JMPL 
111001 RETT 
111010 TN 
111010 TE 

TL OLO TLE 
111010 ape 
111010 TLEU 
111010 TOS 
111010 TNEG 
111010 TVS 
111010 TA 
LAT OLO TNE 
111010 TG 
111010 TGE 
111010 TCU 
111010 BCC 
111010 TPOS 
111010 TVG 
Tae i FLUSH 
111100 SAVE 
111101 RESTORE 
000000 LD 
000001 LDUB 
000010 LDUH 
000011 LDD 
000100 ST 
000101 
000110 
000111 STD 
001001 LDSB 
001010 LDSH 
001101 LDSTUB 
001111 SWAP 
010000 LLDA 
010001 LDUBA 
010010 LDUHA 


AXXRNX 
AXKXRX 
AAXKX 
ARRAX 
AXKKX 
RAXAX 
ARKKR 
AAKXK 
XXXXX 





10 XXXXX 
XXXKXX 
XXXKX 
xO0000 
x0001 
x0010 
xO0011 
x0100 
x01 01. 
x0110 
xO 
x1000 
x1001 
x1010 
SLO 
SLO 
> a ae Oe 
x1110 
peak Be 
yo OO. 





se) 


AXXARX 
AAXKAX 
ARXRXAXX 
SXKXXKXX 
RAXXX 
AARRX 
ARXARXX 


Ke 
— 


PEPE eiReyeleie[e Prey PIPPI PIP eel Ele [e [eR Pie ere pete {ele {re PERI PI [ele iPie 
Pei ele lel e| RP ]R BHT IR] OLOlLOlO]OlOloOlololo DIDIOlOIlOJO/OJloOJoclo Siero ifeiLorLowewee. tone 


AARXAXX 
RAXXX 
AARXARXX 
AXKAXX 
AXKXKXA 
AXKXKX 
AXXAX 
AXXXX 
AXXRX 


Ww} 
ia 
im | 





AAKRAK 





Instruction Set - 
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Table 7-2: SPARC Instructions Sorted by Opcode (Continued) 











Instruction Mnemonic 


010011 
010100 


STC 


t. These instructions are not implemented in hardware. 
















7.5 Instruction Set 


This section provides a reference of all instructions supported in hardware on the 
SPARClite MB86930. For additional information on the instructions refer to 
Chapter 2 “Programmer's Model” and to Chapter 5 “Programming Considerations” 
for code use examples. / 
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ADD 


(08) 
FUJITSU 


ADD 


Add 





Description: 


Computes either “r[rs1]+r[rs2]” if the 7 field is zero, or “r[rs1] + sign_ext(simm13)” 
if the i field is one, and places the result in the destination specified by the rd field. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
o0s00 [wei [0] unused (ero @ 


31.30 29 25 18 14 13 12 0 


24 19 
e00000 


Syntax: 
add TCG rsir PEG rs2r LEG rq 
add réegsc7y 2mmMediate, Teg... 
Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
mov Zp. GAA 
mov A. SA 2 
add Gi y Oey SLO ! $13= 6 


Instruction Set - Add 
7-11 
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ADDec ADDcc 





Description: 


Computes either “r[vs1]+r[rs2]” if the i field is zero, or “r[rs1] + sign_ext(simm13)” 
if the i field is one, and places the result in the destination specified by the rd field. 


ADDcc modifies the integer condition codes. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
10000 ‘| si [0] unused (zero @ 


31 30 29 25 18 14 13 12 0 


24 19 
070000 


| Syntax: 
| addcc LOO psie LCG ysor ESQ, 
addcc TeG..9,; IMMeC ate, reg.., 
Traps: 
(none) 


Condition Code Modified: 


N,Z,U,C 
Example: 
mov Lg) lca. 
addcc CLly SO¢ volo ! “ep l3o="=3 


! nzvc=1000 


Instruction Set - Add and modify icc 
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me) 
FUJITSU 
ADDX ADDX 


Add with carry 





Description: 


Computes either “r[rs1]+r[rs2]+c” if the i field is zero, or “r[vs1] + 
sign_ext(simm13)+c” if the i field is one, and places the result in the destination 
specified by the rd field. 





Format: 
31, 30 29 25 24 19 18 14 13 #12 5 4 0 
ooro0d [si [ie] _unused(eera) | ws? 
31 30 29 25 24 19 18 14.113 «12 0 
007000 
Syntax: 
addx VEG 2.49y VEG. 57 Led sa 
addx reG+soi, immediate, reg_, 
Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
mov me ioe <a: 
addcc Sli, 11, %12 
addx 6g0, S90, 613 ! $13= 1 


Instruction Set - Add with carry 
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ADDXcc ADDXcc 


Add with carry and modify icc 





Description: 


Computes either “r[rs1]+r[rs2]+c” if the 7 field is zero, or “r[rs1] + 
sign_ext(simm13)+c” if the i field is one, and places the result in the destination 
specified by the rd field. 


ADDxcc modifies the integer condition codes. 


Format: 
31 30 29 25 24 19 18 14. 138 12 5 4 0 
011000 | _wsi__‘[0] unused (wero) [132 
31 30 29 25 14 13 12 0 


24 19 18 
077000 


Syntax: 
addxcc LEGrsir TEQGrs2r TeGrg 
addxce reg pay “AMmMediatey, Tega, 
Traps: 
(none) 


Condition Code Modified: 


ily Zy Oy © 
Example: 
mov -1, 11 
mov Sy. ils 
addcc $11,%11,%12 ! nzvc=1001 
adaxce,  615407-6.0 ! $13=0, nzvc=0101 


Instruction Set - Add with carry and modify icc 
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co 

FUJITSU 

AND AND 
one: 


Description: 


Implements a bitwise logical And to compute either “r[rs1] and r[rs2]” if the 7 field 
is zero, or “r[rs1] and sign_ext(simm13)” if the 7 field is one, and places the result 
in the destination specified by the rd field. 


Formatk: 


18 14_13 #12 5 4 0 


31 30 29 25 24 19 
woot et —([0] unused fer 2 


18 14 113 #12 0 


31 30 29 25 24 19 
q00001 


Syntax: 





and ECO p25, - OO veor - MeGug 
and PeGees, IMMedLave,; Teg.4 


Traps: 


(none) 


Condition Code Modified: 


(none) 


Example: 


mov Oxy: soled 
mov Ox3 S12 
and Bis: doe “SS ! 


fo)\e) 
b+ 
GO 
\| 
© 
x 
‘oa 


Instruction Set - And 
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ANDcc 





ANDcc 
And and modify icc 


Description: 


Implements a bitwise logical And to compute either “r[rs1] and r[rs2]” if the i field 
is zero, or “r[rs1] and sign_ext(simm13)” if the 7 field is one, and places the result 
in the destination specified by the rd field. 


ANDcc modifies the integer condition codes. 


Format: 
31 30 29 25 24 19 18 14 #13 12 5 4 0 
Lt a ae 010001 | _tst_——_—ifisO] unused (zero) rs2 


31 30 29 25 18 14.13 12 0 


24 19 
Pio [a | 070001 


Syntax: 
andcc TEGrsir TEGrs2r TEGrg 
andcc reg,s;, immediate, reg-,, 
Traps: 
(none) 


Condition Code Modified: 


n, z, v=0, c=0 


Example: 
mov O25; - SL 
and Sills. Uxaig. S13 ! $13= 0x0, nzvc=0100 


Instruction Set - And and modify icc 
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ANDN 


co 
FUJITSU 


ANDN 


And Not 


Description: 


Implements a bitwise logical And Not to compute either “r[rs1] andn r[rs2]” if the 
i field is zero, or “r[rs1] andn sign_ext(simm13)” if the 7 field is one, and places the 
result in the destination specified by the rd field. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
ooo «| st‘) _ unused aro EZ 
31 30 29 25 24 19 18 14 #13 12 0 


Syntax: 
andn LEGrsir TEGrs2r LEGraq 
andn LeGLsje. 1mmedbate;,: Teg, 
Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
mov OS % . oe Ld 
mov Ox3 12 
andn Blige - olay Foe Le Sa (Osc4 


Instruction Set - And Not 
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ANDNec ANDNcc 


And Not modify icc 
Le eee 





a 


Description: 


Implements a bitwise logical And Not to compute either “r[rs1] andn r[rs2]” if the 
i field is zero, or “r[rs1] andn sign_ext(simm13)” if the i field is one, and places the 
result in the destination specified by the rd field. 


ANDNcc modifies the integer condition codes. 


Format: 
31 30 29 25 24 19 18 14 #13 #12 5 4 0 
ororor | st _—(eO] unused (zero) 2 
31 30 29 25 24 19 18 14.13 12 0 
OTTO 
Syntax: 
andncc POG very CO veo) Fes 
andncc FEQ.57,, AMMCCIate, LEG a 
Traps: 
(none) 


Condition Code Modified: 


n, 2, 0=0,.c=0, 


Example: 
mov OxS7 sb) 
andnce, t¢11,. 0x3... $13 ! $13= 0x4, nzvc=0000 


Instruction Set - And Not modify icc 
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BA 


co 
FUJITSU 


BA 


Branch Always 


Description: 


BA causes a PC-relative, delayed control transfer to the address “PC + (4 x 
sign_ext(disp22))”, regardless of the value of the condition code bits. 


If the annul field of the branch instruction is 1, the delay instruction is annulled 
(not executed). If the annul field is 0, the delay instruction is executed. (Note: this 
is the reverse of the case for other conditional branches) 





Format: 
31 30 29 28 25 24 22 21 0 
fo [a] io] om | —S~CS~—<S SCCC“‘“SC*~‘*”’ 
Syntax: i 
ba label 
ba,a label ' annul bit set 
Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
ba XYZ 
mov Ox4, S11 ! delay slot 


Instruction Set - Branch Always 
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BCC BCC 





Description: 


BCC causes a PC-relative, delayed control transfer to the address “PC + (4 x 
sign_ext(disp22))”, if the carry (C) bit in the PSR is clear. 


The annul bit only affects execution if the branch is not taken. With the annul (a) 
bit set, the delay instruction is annulled (not executed). With the annul (a) bit 
clear, the delay instruction is executed. 


Format: 
31 30 29 28 25 24 22 21 0 
foo fal wo | oo | ——SreSSC~—“—~SCSCSCSY 
Syntax: 
bec label 
bgeu label ! alternate mnemonic 
becc,a label ' annul bit set 
bgeu,a label 
Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
bcc,a XYZ 
mov Ox4, S11 ! delay slot not executed if branch not taken 


Instruction Set - Branch on Carry Clear (Branch Greater or Equal Unsigned) 
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BCS 


O 
FUJITSU 





BCS 





Description: 





BCS causes a PC-relative, delayed control transfer to the address “PC + (4 x 
sign_ext(disp22))”, if the carry (C) bit in the PSR is set. 


The annul bit only affects execution if the branch is not taken. With the annul (a) 
bit set, the delay instruction is annulled (not executed). With the annul (a) bit 
clear, the delay instruction is executed. 


Format: 


31 30 29 28 25 24 «22-24 ; 0 


rola] om | mo | SCS SC=“‘SC*~* 





Syntax: 
bcs label 
blu label ! alternate mnemonic 
bcs,a label ! annul bit set 
blu,a label 

Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
Des XYZ 
mov Ox4, 11 ! delay slot 


Instruction Set - Branch on Carry Set (Branch on Less Than, Unsigned) 
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‘ 





Description: 


BE causes a PC-relative, delayed control transfer to the address “PC + (4x 
sign_ext(disp22))”, if Z is set. 


The annul bit only affects execution if the branch is not taken. With the annul (a) 
bit set, the delay instruction is annulled (not executed). With the annul (a) bit 
clear, the delay instruction is executed. 


Format: 
31 30 29 28 25 24 22 21 0 
foo fal oon | oo] dwaeSSs—~—SSSCisdS 
| 
/ Syntax: 
| be abet 
bz label ' alternate mnemonic 
be,a label ! annul bit set 
bz,a label 
Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
bz XYZ 
mov Ox4, S11 ! delay slot 


Instruction Set - Branch on Equal (Branch on Zero) 
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BG 


Branch on Greater 


co 
FUJITSU 


BG 


Description: 


BG causes a PC-relative, delayed control transfer to the address “PC + (4 x 
sign_ext(disp22))”, if “not(Z or (N xor V))” is true. 


The annul bit only affects execution if the branch is not taken. With the annul (a) 
bit set, the delay instruction is annulled (not executed). With the annul (a) bit 
clear, the delay instruction is executed. 


Format: 


31 30 29 28 25 24 22 21 0 


1010 010 disp22 
foo fa] toro [| o1o [oo isp2@ 


Syntax: 

bg label 

Ddya label ! annul bit set 
Traps: 

(none) 


Condition Code Modified: 


(none) 
Example: 
bg XYZ 
mov Ox4, S11 ! delay slot 


Instruction Set - Branch on Greater 
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BGE BGE 


Bes ca SN 


Ses a ue See es 


or Equal 


see TRONS 


snes 





Description: 


BGE causes a PC-relative, delayed control transfer to the address “PC + (4 x 
sign_ext(disp22))”, if “not(N xor V)” is true. 


The annul bit only affects execution if the branch is not taken. With the annul (a) 
bit set, the delay instruction is annulled (not executed). With the annul (a) bit 
clear, the delay instruction is executed. 


Format: 


31_ 30 29 28 25 24 22 21 0 


foo fal wor | om | —SS=~CSC“‘“SC*~*” 


| Syntax: 
| bge label | 
bge,a label ! annul bit set 
Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
bge XYZ 
mov Ox4, sll ! delay slot 


Instruction Set - Branch on Greater or Equal 
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BGU 





co 
FUJITSU 


BGU 


Branch on Greater Unsigned 
SSE EES SA SE RS OE Ce SE ROE So OER EEN POS EC 


SS 





Description: 


BGU causes a PC-relative, delayed control transfer to the address “PC + (4 x 
sign_ext(disp22))”, if “not(C or Z)” is true. 


The annul bit only affects execution if the branch is not taken. With the annul (a) 
bit set, the delay instruction is annulled (not executed). With the annul (a) bit 
clear, the delay instruction is executed. 


Format: 


31.30 29 28 25 24 22 21 0 


Poo fal too] om | Sg CSC—“—~S~SCSCS 


Syntax: 

bgu label 

bgu,a label ! annul bit set 
Traps: 

(none) 


Condition Code Modified: 


(none) 
Example: 
bgu XYZ 
mov Ox4, 611 { delay siot 


Instruction Set - Branch on Greater, Unsigned 
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BL 


BL 


Sea 





Description: 
BL causes a PC-relative, delayed control transfer to the address “PC + (4 x 
sign_ext(disp22))”, if “N xor V” is true. 


The annul bit only affects execution if the branch is not taken. With the annul (a) 
bit set, the delay instruction is annulled (not executed). With the annul (a) bit 
clear, the delay instruction is executed. 


Format: 
31 30 29 28 25 24 22 21 0 
poo fal oon [ oo | dwaesss—s—“—*s*~*s~*~S~SCS 
Syntax: 
lod label 
bl,a label ! annul bit set 
Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
bl XYZ 
mov Ox4, S11 ! delay slot 


Instruction Set - Branch on Less 
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(oe) 
FUJITSU 


BLE BLE 





Branch on Less or Equal 


Description: 


BLE causes a PC-relative, delayed control transfer to the address “PC + (4 x 
sign_ext(disp22))”, if “Z or (N xor V)” is true. 


The annul bit only affects execution if the branch is not taken. With the annul (a) 
bit set, the delay instruction is annulled (not executed). With the annul (a) bit 
clear, the delay instruction is executed. 





Format: 
31 30 29 28 25 24 22 21 0 
fo fal oo | of | —SsC~<CS~—sSSSCSC“‘CNSC#”#C#”#C#C‘“‘(X’ 
Syntax: 
ble label 
ble,a label ' annul bit set 
Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
ble xVZ 
mov Ox4, S11 { delay siot 


Instruction Set - Branch on Less or Equal 
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BLEU BLEU 


on Less or Equal, Unsigned 
ESS ESSE ERE SE SSO MERE se 


SSE See pss SHS See ERE, 





Description: 


BLEU causes a PC-relative, delayed control transfer to the address “PC + (4 x 
sign_ext(disp22))”, if “C or Z” is true. 


The annul bit only affects execution if the branch is not taken. With the annul (a) 
bit set, the delay instruction is annulled (not executed). With the annul (a) bit 
clear, the delay instruction is executed. 


Format: 
31 30 29 28 25 24 22 21 0 
fo fal ooo [ 0 | —Ss=—<SsSC~“~—S 
Syntax: 
bleu label 
bleu,a label ! annul bit set 
Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
bleu XYZ 
mov Ox4, S11 ! delay slot 


Instruction Set - Branch on Less or Equal, Unsigned 
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oc 
FUJITSU 


BN 


FER Bese sO RIE eae Bs CET RATER IS MOA SYN EO ELA ISELE TEAS, 





Description: 


BN acts like a “NOP” except that if the annul field is one, the delay instruction is 
not executed (annulled). If the annul (a) field is zero, the delay instruction is exe- 
cuted. 


Format: 
31 30 29 28 25 24 22 21 0 
fo [al coo | oo | SsSst=<CS~sSSCCC“‘CNSC#CO”*C‘(‘’ 
Syntax: 
bn label 
bn,a label ! annul bit set 
Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
bn KVZ 
mov Ox4, S11 ! delay slot 


Instruction Set - Branch Never 
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BNE BNE 





SRA Raa SUN ERO NS RR a 


PESO 


Equal (Branch on Not Zero) 


ee 





SAT ea 


Description: 


BNE causes a PC-relative, delayed control transfer to the address “PC + (4 x 
sign_ext(disp22))”, if Z is clear. 


The annul bit only affects execution if the branch is not taken. With the annul (a) 
bit set, the delay instruction is annulled (not executed). With the annul (a) bit 
clear, the delay instruction is executed. 


Format: 


31. 30 29 28 21 


25 24 22 0 
Poo Jal for ] oo | —S~SCSCC~=“‘“S*~*” 


Syntax: 
bne label 
bnz label ! alternate mnemonic 
bne,a label ! annul bit set 
bnz,a label 

Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
bnz XYZ 
mov Ox4, S1ll ! delay slot 


Instruction Set - Branch on Not Equal (Branch on Not Zero) 
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BNEG 


B h N ti 
Se ance ee MRE ESR REE EE ARSE EE TEE BARES bas LADO ON pec BELO LER NAUEE IN 


as 





Description: 





co 
FUJITSU 


BNEG 


BNEG causes a PC-relative, delayed control transfer to the address “PC + (4 x 


sign_ext(disp22))”, if N is set. 


The annul bit only affects execution if the branch is not taken. With the annul (a) 
bit set, the delay instruction is annulled (not executed). With the annul (a) bit 


clear, the delay instruction is executed. 


Format: 
31 30 29 28 25 24 22 21 0 
foo fal ono [| mo | ——S—SeSSC—“—SCS 
Syntax: 
bneg label 
bneg,a label ' annul bit set 
Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
bneg XYZ 
mov Ox4, S11 ! delay slot 





Instruction Set - Branch on Negative 
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BPOS BPOS 





Description: 


BPOS causes a PC-relative, delayed control transfer to the address “PC + (4 x 
sign_ext(disp22))”, if N is clear. 


The annul bit only affects execution if the branch is not taken. With the annul (a) 
bit set, the delay instruction is annulled (not executed). With the annul (a) bit 
clear, the delay instruction is executed. 


Format: 
31 30 29 28 25 24 22 21 0 
poo fal imo | oo | ——SromeSC—“~*S*~s*~SCSCS 
Syntax: 
bpos label 
bpos,a label ! annul bit set 
Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
bpos XYZ 
mov Ox4 - S11 ! delay slot 


Instruction Set - Branch on Positive 
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BVC 


(oe) 
FUJITSU 


BVC 


Branch on Overflow Clear 


Description: 


BVC causes a PC-relative, delayed control transfer to the address “PC + (4 x 
sign_ext(disp22))”, if V is clear. 


The annul bit only affects execution if the branch is not taken. With the annul (a) 
bit set, the delay instruction is annulled (not executed). With the annul (a) bit 
clear, the delay instruction is executed. 


Format: 


31.30 29 28 25 24 22 21 


0 
1111 010 disp22 
foo fat ti f oo PO aispze 


Syntax: 

bvc label 

bvc,a label ! annul bit set 
Traps: 

(none) 


Condition Code Modified: 


(none) 
Example: 
bvc SV Zz 
mov 0x4, <g.L/1 ! delay slot 


Instruction Set - Branch on Overflow Clear 
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BVS 


BVS 


SRSA Ae toes tees eGR ee HRN TEE ORTEGA eEE OG EE EE ODN AOE 





Description: 


BVS causes a PC-relative, delayed control transfer to the address “PC + (4 x 
sign_ext(disp22))”, if V is set. 


The annul bit only affects execution if the branch is not taken. With the annul (a) 
bit set, the delay instruction is annulled (not executed). With the annul (a) bit 
clear, the delay instruction is executed. 


Format: 
31 30 29 28 25 24 22 21 0 
poo fal om | oo | ——SromSC~—“—SCSCSY 
Syntax: 
bvs label 
bvs,a label ! annul bit set 
Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
bvs XYZ 
mov Ox4, S11 ! delay slot 


Instruction Set - Branch on Overflow Set 
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oo) 
FUJITSU 


CALL CALL 


Call Instruction 
Description: 


The CALL instruction causes an unconditional, delayed, PC-relative control 
transfer to address “PC + (4 x disp30)”. Since the word displacement field is 30 
bits wide, the target address can be arbitrarily distant. The CALL instruction also 
writes the value of PC, which contains the address of the CALL, into %o7 (r[15]). 


Format: 


31.30 29 0 


Syntax: 


call label 





Traps: 


(none) 


Condition Code Modified: 


(none) 
Example: 
call XYZ 
mov Ox4, S11 ! delay slot 


Instruction Set - Call Instruction 
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DIVSCC 


DIVSCC 





Description: 


The DIVScc instruction performs one bit-cycle of a non-restoring, shift-before- 
add, signed or unsigned division. Initially, the most significant half of the divi- 
dend is in the Y register, the least significant half is in r[rs1]. The divisor is in 
r[rs2]. Subsequently, the most significant half of the partial remainder is in the Y 
register, the least significant half is in r[rs1]. 


DIVSCC operates as follows: 


i” 


The true sign is formed using the negative (n) and overflow (v) integer condi- 
tion codes from the Processor Status Register. True sign = n XOR v. 


The remainder is formed by upshifting the Y register (initially the most signifi- 
cant word of the dividend) one bit, and setting the least significant bit of 
remainder equal to most significant bit of r[rs1] (initially the least significant 
word of the dividend). 


The divisor is r[rs2] if the i field is 0, or simm13, sign-extended to 32 bits, if the i 
field is 1. 


If true sign = 0 (+), the ALU computes remainder - divisor. If true sign =1 (—), the 
ALU computes remainder + divisor. 


Carry out from the ALU operation is noted as c0. The negative (n) condition 
code is set to bit 31 of the ALU result. The zero (z) condition code is set if the 
ALU result is 0 AND the true sign equals Y[31], else cleared. 


The new true sign is formed as (true sign AND NOT Y[31]) OR (NOT cO AND 
(true sign OR NOT Y[31))). 


The overflow (v) condition code is formed as new true sign XOR bit 31 of the 
ALU result. The carry (c) condition code is set to NOT new true sign. Y is set to 
the 32-bit ALU result. If rd is not 0, then r[rd] is set to r[rs1], upshifted one bit 
with NOT new true sign (the new quotient bit) in the least significant bit 
position. | 
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Divide St (Conti d ) 
FREI SSR RE CN BOC ORISA IRE 9 NOON ROME RDN PE ARDEA ROE BP DOMES elo ROP SINT NO 





Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
011101 | rst —sfi-O] reserved rs2 


31 30 29 25 18 14 13 12 0 


24 19 
ortT0% 


Syntax: 
divscc TEGrsir TEGrs2r TEGrg 
divsce regdvagx tmmedbate,, Legg 
Traps: 
(none) 


Condition Code Modified: 


Example: 


See Chapter 5 “Programming Considerations” for sample signed and unsigned divi- 
sion routines based on the DIVScc instruction as well as some application exam- 
ples. 
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JMPL 


JMPL 


Jump and Link 
BRL eS AE aS RROD ERTS RCTS RUNRNY ANOS RONEN, SORE 





Description: 

The JMPL instruction causes a register-indirect control transfer to an address 
specified by either “r[rs1] + r[rs2]” if the i field is zero, or “r[rs1] + 
sign_ext(simm13)” if the 7 field is one. 

The JMPL instruction writes the PC, which contains the address of the JMPL 
instruction, into the destination r register specified in rd field. 


If either of the low-order two bits of the jump address is nonzero, a mem_ad- 
dress_not_aligned trap occurs. 


Format: 
31 30 29 25 24 19 18 1413 12 5 4 0 
i000 wi__—‘[e0] unused ero) [a2 
31 30 29 25 24 19 18 1413 12 0 
777000 
Syntax: 
jmp l VEO; s77- TEC oy. LEG ns 
jmp 1 reQ,5;, immediate, reg,, 
Traps: 


mem_address_not_aligned 


Condition Code Modified: 


(none) 
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Jump and Link (Continued) 





Example: 
jmp 1 $12+0xf8, %sg0 
mov Oxfe, @11 ! delay slot 


notes:_-JMPL with rd=%g0 can be used to return from a subroutine. 


e For a non-leaf subroutine the typical return address is “r[31]+8”, if the sub- 
routine was entered by a call instruction. (Note: The pseudo operation “ret” 
invokes this return address). A leaf subroutine (no use of save, no call to 
other subroutines) can use “r[15]+8” as the return address. (Note: Pseudo 
operation “retl” invokes this return address). 


e JMPL with rd = 15 can be used as a register-indirect CALL. 


e When the delay slot instruction of JMPL is RETT, the target of the JMPL is 
the address space pointed to by the state of the machine after the RETT is 
executed (this is important when returning from a trap (which is supervisor 
space) to user address space). 
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LD LD 





Description: 


The LD instruction moves a word from memory into the r register defined by the 
rd field. The source value is loaded from either “r[rs1] + r[rs2]” if the z field is zero, 
or “r[rs1] + sign_ext(simm 13)” if the 1 field is one. 


The address space identifier (ASD indicates either user data (OxA) or supervisor 
data (OxB) according to the S bit of the PSR. 


If the LD instruction traps, the destination register (rd) remains unchanged. 


Format: 
| 31 30 29 25 24 19 18 14.13 12 5 4 0 
qo0000 [ri [0] —orused Gorey [a2 


31.30 29 25 18 14.13 12 0 


| 24 19 
: Cc 


Syntax: 

ld [regrsy7+ LTEGrs2], LCGrg 

ld [reg,5, +/- immediate], reg-g 
Traps: 


mem_address_not_aligned 
data_access_exception 


Condition Code Modified: 


(none) 
Example: 
ld [sg0 + Oxfe0], %14 
ld [OxfeO], %14 !recognized as equivalent 
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LDA 
Load Word from Alternate Space 


Description: 


The LDA instruction moves a word from memory into the r register defined by 
the rd field. The source value is loaded from “r[rs1] + r[rs2]” with the ASI field 
designating the ASI value. 


If the LDA instruction traps, the destination register (rd) remains unchanged. 
LDA is a privileged instruction which can only be executed in supervisor mode. 


Format: 


31 30 29 25 24 19 18 14.113 #12 5 4 0 
o10000 | rst in ASI rs2 
Syntax: 


lda [RoC wen. S SO ee ALY. eu, 


Traps: 


mem_address_not_aligned 
data_access_exception 

privileged_instruction (if not supervisor mode) 
illegal _instruction (if i=1) 


Condition Code Modified: 


(none) 
Example: 
lda [Sid a+ Sl2 Oxi, 214 ! AST value 15 decimal 
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LDD LDD 


EES SGN SO SOON SENET SOAS BINGE SO ENE Se NOLO NS SNES, MOORE ANE 





Description: 


The LDD instruction moves two words from memory into an r register pair. The 
most significant word at the effective memory address is moved into the even r 

register. The least significant word, which is at the effective memory address + 4, 
is moved into the odd r register. The least significant bit of the rd field is ignored. 


The source value is loaded from either “r[rvs1] + r[rs2]” if the i field is zero, or 
“r[rs1] + sign_ext(simm 13)” if the i field is one. 


The address space identifier (ASI) indicates either user data (OxA) or supervisor 
data (OxB) according to the S bit of the PSR. 


If the LDD instruction traps while loading the second word the even destination 
register (1doye,) will have been changed. 


Format: 
31 30 29 25 24 14 #18 12 5 4 0 
atop ee 000011 a see eee unused (zero) rs2 


31. 30 29 14.113 + =«+12 0 


A AS 


Syntax: 

ldd [regrs7t LTeGrs2]l, LeGrg 

ldd [reg,5, +t/- immediate], reg, 
Traps: 


mem_address_not_aligned 
data_access_exception 


Condition Code Modified: 


(none) 
Example: 
ldd [$i5 + $12], %g2 
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LDDA 
Load Doubleword from Alternate Space 


Description: 


The LDDA instruction moves two words from memory into an r register pair. 
The most significant word at the effective memory address is moved into the even 
r register. The least significant word, which is at the effective memory address + 
4,is moved into the odd r register. The least significant bit of the rd field is 
ignored. 


The source value is loaded from “r[rs1] + r[rs2]” with the ASI field designating the 
ASI value. 


If the LDD instruction traps while loading the second word the even destination 
register (1dye,) will have been changed. 


Format: 
31 30 29 25 24 19 18 14 +13 = #12 5 4 0 
[a] a | oom [i [0 ASI 2 
Syntax: 
ldda [PEO vez aE CEG u65 7 CEGrg 
ldda [yYeg.e7 +/- immediate), reg. 
Traps: 


mem_address_not_aligned 
data_access_exception 

privileged_instruction (if not supervisor mode) 
illegal_instruction (if i=1) 


Condition Code Modified: 


(none) 
Example: 
ldda [sog/ - 5]Oxl, S04 
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LDSB LDSB 





Description: 


The LDSB instruction moves a byte from memory into the r register defined by 
the rd field. The fetched byte is right-justified in rd and is sign-extended. The 
source value is loaded from either “r[rs1] + r[rs2]” if the i field is zero, or “r[rs1] + 
sign_ext(simm 13)” if the 7 field is one. 


The address space identifier (ASI) indicates either user data (OxA) or supervisor 
data (OxB) according to the S bit of the PSR. 


If the LD instruction traps, the destination register (rd) remains unchanged. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
oroot | et‘ [m0] _unusediero) | 12 


31 30 29 25 18 14.13 +12 0 


24 19 
007007 


Syntax: 

ldsb [regys, + LeGrs2], LeGry 

ldsb [reG2.9 “k7 = 2anmedrave).;. ed cs 
Traps: 


data_access_exception 


Condition Code Modified: 


(none) 
Example: 
ldsb [gO + Oxfe0], %14 
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LDSBA 





RNAS SRO HEE RNS SNARE OGL RE MIAN UU SAR I 


Load Signed Byte from Alternate Space 


Description: 


The LDSB instruction moves a byte from memory into the r register defined by 
the rd field. The fetched byte is right-justified in rd and is sign-extended. The 


source value is loaded from “r[rs1] + r[rs2]” with the ASI field designating the ASI 
value. 


If the LDSBA instruction traps, the destination register (rd) remains unchanged. 
LDSBA is a privileged instruction which can only be executed in supervisor 
mode. 


Format: 
31 30 29 25 24 19 18 14 13 #12 5 4 0 
pat | rd otto | tst i] ASI rs2 
Syntax: 
ldsba LYE neq; T VEO rep Aoly. FEGn, 
Traps: 


data_access_exception 
privileged_instruction (if not supervisor mode) 
illegal_instruction (if i=1) 


Condition Code Modified: 


(none) 
Example: 
ldsba [Sll + $12]Oxf, %14 ' ASI value 15 decimal 
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7-46 


LDSH 





Description: 


The LDSH instruction moves a halfword from memory into the r register defined 
by the rd field. The fetched halfword is right-justified in rd and is sign-extended. 
The source value is loaded from either “r[rs1] + r[rs2]” if the 7 field is zero, or 
“r[rs1] + sign_ext(simm 13)” if the 7 field is one. 


The address space identifier (ASI) indicates either user data (OxA) or supervisor 
data (OxB) according to the S bit of the PSR. 


If the LDSH instruction traps, the destination register (rd) remains unchanged. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
ooroio (| st__‘[r0] unused (ero 2 


3130 29 25 18 14 13 #12 0 


24 19 
007070 


Syntax: 

ldsh [(regr517 + LEGrselr LEGrg 

ldsh [reg,s, +/- immediate], reg-,, 
Traps: 


data_access_exception 
mem_address_not_aligned 


Condition Code Modified: 


(none) 
Example: 
ldsh [sg0 + Oxfe0O], 14 
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LDSHA LDSHA 


Description: 


The LDSH instruction moves a halfword from memory into the r register defined 
by the rd field. The fetched halfword is right-justified in rd and is sign-extended. 
The source value is loaded from “r[rs1] + r[rs2]” with the ASI field designating the 
ASI value. 


If the LDSHA instruction traps, the destination register (rd) remains unchanged. 
LDSHA is a privileged instruction which can only be executed in supervisor 





mode. 
Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
caf [ono [| mt ‘(ro AS @ 
Syntax: 
ldsha [ESO weg “LOG. asl Aol, Leg u4 
Traps: 


data_access_exception 
mem_address_not_aligned 
privileged_instruction (if not supervisor mode) 
illegal_instruction (if i=1) 


Condition Code Modified: 


(none) 
Example: 
ldsha [S$l1 + G12]Oxf, 314 ! AST value 15 decimal 
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LDSTUB LDSTUB 


Atomic Load-Store Unsigned Byte 





Description: 


The LDSTUB instruction moves a byte from memory into an r register identified 
by the rd field and then rewrites the same byte in memory to all ones atomically 
(without allowing intervening asynchronous traps). The value in the rd register is 
right justified and zero-filled. 


The source value is loaded from either “r[rs1] + r[rs2]” if the 7 field is zero, or 
“r[rs1] + sign_ext(simm 13)” if the i field is one. 


The address space identifier (ASI) indicates either user data (OxA) or supervisor 
data (OxB) according to the S bit of the PSR. 


If the LDSTUB instruction traps, memory remains unchanged. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0- 
001101 | tst [ix unused (zero) rs2 


31 30 29 25 18 14,113 «12 0 


24 19 
SL 


Syntax: 
ldstub (PeGives. “t: PEG sols Legg 
ldstub [LeG ney +/> Ammediate] reg 4 
Traps: 


data_access_exception 


Condition Code Modified: 


(none) 


Example: 


LOStubs - Peo = Oxi. sso 
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LDSTUBA LDSTUBA 
Atomic Load-Store Unsigned Byte into Alternate Space 


Description: 


The LDSTUBA instruction moves a byte from memory into anr register identified 
by the rd field and then rewrites the same byte in memory to all ones atomically 
(without allowing intervening asynchronous traps). The value in the rd register is 
right justified and zero-filled. 


The source value is loaded from “r[rs1] + r[rs2]” with the ASI field designating the 
ASI value. 


If the LDSTUBA instruction traps, memory remains unchanged. LDSTUBA is a 
privileged instruction which can only be executed in supervisor mode. 


Format: 





31 30 29 25 18 14.13 «#12 5. 4 0) 


24 19 
ano [ei _‘[e0) 7S 2 


Syntax: 


ldstuba [reg,s5, + regrs2]ASI, reg-, 


Traps: 
data_access_exception 
privileged_instruction (if not supervisor mode) 
illegal_instruction (if i=1) 

Condition Code Modified: 


(none) 


Example: 


Ldstuaba:. Pell seil2poxt,. e14 ! ASI value 15 decimal 
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LDUB LDUB 


SESS RO en ree EES NTS UCR, ae EE ee oT 





Description: 


The LDUB instruction moves an unsigned byte from memory into the r register 
defined by the rd field. The fetched halfword is right-justified in rd and is zero- 
filled. The source value is loaded from either “r[rs1] + r[rs2]” if the i field is zero, 
or “r[rs1] + sign_ext(simm 13)” if the 7 field is one. 


The address space identifier (ASI) indicates either user data (OxA) or supervisor 
data (0xB) according to the 5 bit of the PSR. 


If the LDUB instruction traps, the destination register (rd) remains unchanged. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
00001 [| ist__—‘[0] unused (oro) | 12 


31.30 29 25 18 14 13 12 0 


24 19 
000007 


Syntax: 

Laub [reGro7 + LeGrs2], TEGrg 

ldub [reg,., +/- immediate], reg,, 
Traps: 


data_access_exception 


Condition Code Modified: 


(none) 
Example: 
ldub [$g0 + Oxfe0], %14 
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LDUBA LDUBA 
Load Unsigned Byte from Alternate Space 


Description: 


The LDUBA instruction moves a byte from memory into the r register defined by 
the rd field. The fetched byte is right-justified in rd and is zero-filled. The source 
value is loaded from “r[rs1] + rlrs2]” with the ASI field designating the ASI value. 


If the LDUBA instruction traps, the destination register (rd) remains unchanged. 
LDUBA is a privileged instruction which can only be executed in supervisor 





mode. 
Format: 
31. 30 29 25 24 19 18 14 13 #12 5 4 0 
Pa] a [ o1oor | st [0 AS 2 
Syntax: 
lduba PReGsa7 “FEE ..o holy eg. 
Traps: 


data_access_exception 
privileged_instruction (if not supervisor mode) 
illegal_instruction (if i=1) 

Condition Code Modified: 


(none) 


Example: 


lduba [$ll + @12]QOxf, 314 'ASI value 15 decimal 
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LDUH 


LDUH 


Load Unsigned Halfword 





Description: 


The LDUH instruction moves a halfword from memory into the r register defined 
by the rd field. The fetched halfword is right-justified in rd and is zero-filled. The 
source value is loaded from either “r[rs1] + r[rs2]” if the z field is zero, or “r[rs1] + 
sign_ext(simm 13)” if the i field is one. 


The address space identifier (ASI) indicates either user data (OxA) or supervisor 
data (OxB) according to the S bit of the PSR. 


If the LDUH instruction traps, the destination register (rd) remains unchanged. 


Format: 
31 30 29 25 24 19 18 14.13 #12 5 4 0 
Pi] [wore [ei —*KO] unused eeroy et 
31 30 29 25 24 19 18 14 #13 #12 0 
Syntax: 
lduh Pee ea YC9rs2r Le9rd 
lduh [reg,,,; +/- immediate], reg,, 
Traps: 
data_access_exception 
mem_address_not_aligned 
Condition Code Modified: 
(none) 
Example: 
lduh [sg/ - Oxfeb], 314 
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LDUHA LDUHA 





Description: 


The LDUHA instruction moves a halfword from memory into the r register 
defined by the rd field. The fetched halfword is right-justified in rd and is zero- 
filled. The source value is loaded from “r[rs1] + r[rs2]” with the ASI field desig- 
nating the ASI value. 


If the LDUHA instruction traps, the destination register (rd) remains unchanged. 
LDUHA is a privileged instruction which can only be executed in supervisor 





mode. 
Format: 
31 30 29 25 24 19 18 14 13 #12 5 4 0 
Paya [ ooo | i [m0] ASI 2 
Syntax: 
lduha [xeG pay LEGpaplASl,. reg. 
Traps: 


data_access_exception 
privileged_instruction (if not supervisor mode) 
illegal_instruction (if i=1) 

Condition Code Modified: 


(none) 


Example: 


lduha [sg7 - Oxfeb]0Oxee, %13 
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MULScc MULScc 





Description: 


The MULScc can be used to generate up to 64-bit products of two signed or 
unsigned words. MULScc works as follows: 


1. Compute the value obtained by shifting “r[rs1]” (the incoming partial prod- 
uct) right by one bit and replacing its high-order bit by “N xor V” (the sign of 
the previous partial product). 


2. If the least significant bit of the Y register (the multiplier) is set, the value from 
step (1) is added to the multiplicand. The multiplicand is “r[rs2]” if the i field 
is zero or is “sign_ext(simm13)” if the 7 field is one. If the LSB of the Y register 
is not set, then zero is added to the value from step (1). 


3. The result from step (2) is written into “r[rd]” (the outgoing partial product). 
The PSR’s integer condition codes are updated according to the addition per- 
formed in step (2). 


4. The Y register (the multiplier) is shifted right by one bit and its high_order bit 
is replaced by the least significant bit of “r[rs1]” (the incoming partial prod- 
uct). 


It should be noted that, for most applications, the UMUL/SMUL instructions are 
a faster and more efficient means of multiplying integer values. However 
MULScc can be used for other bit manipulations. See Chapter 5 “Programming 
Considerations” for details. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
rooroo (| ist__—ieo] reserved «|e 


31. 30 29 25 18 14.113 ~«+12 0 


24 19 
700700 


Syntax: 
mulscce TEGrsir TEGrs2r TeOrg 
mulscc Peg yay, LumMedLate,;. Lequ, 
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Traps: 


(none) 


Condition Code Modified: 
(none) 


Example: 


{e) 


muilscc %04, %ol1l, %o04 
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NOP NOP 





Description: | 


The NOP instruction changes no program-visible state (except the PC and nPC) 


Format: 
31 30 29 25 24 22 21 0 
| oo | ooo | too {| 0000000000000 
Syntax: 
nop 
Traps: 
| (none) 


Condition Code Modified: 


(none) 


Example: 


bz target 
nop 'delay slot 
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OR 





Implements a bitwise logical inclusive Or to compute either “r[rs1] or r[rs2]” if the 


i field is zero, or “r[rs1] or sign_ext(simm13)” 


if the 7 field is one, and places the 


result in the destination specified by the rd field. 


Formar: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
000010 | erst i=0 | unused (zero) rs2 


31.30 29 25 


18 14 13 #12 0 


24 19 
qpo0%6 


Syntax: 

or TEGrsir LEQrs2r LCG rq 

or Yed,>51, immediate, reg,, 
Traps: 

(none) 


Condition Code Modified: 


(none) 
Example: 
or eg0, =Ly. “sos f mow aL, 


603 equivalent 
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ORcc ORcc 


Inclusive OR and modify icc 
Ea SPDR SENSE IE OR STR RRS INS 





Description: 


Implements a bitwise logical inclusive Or to compute either “r[rs1] or r[rs2]” if the 
i field is zero, or “r[rs1] or sign_ext(simm13)” if the i field is one, and places the 
result in the destination specified by the rd field. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
Pot a | 010] t_—‘[r0] unused oro wz 
31 30 29 25 24 19 18 14 13 12 8) 
| Syntax: 
| Orcc TEGrsir TEGrs2r TEGrg 
| OLGe Y€Gyoi1, immediate, reg,, 
Traps: 
(none) 
Condition Code Modified: 
n, z, v=0, c=0 
Example: 
mov <1 303 
Orce 2007 “0; 200 ! tst to03 equivalent, nzvc=1000 
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ORN ORN 


Inclusive Or Not 





Description: 


Implements a bitwise logical inclusive Or Not to compute either “r[rs1] orn r[rs2]” 
if the 7 field is zero, or “r[rs1] orn sign_ext(simm13)” if the 7 field is one, and places 
the result in the destination specified by the rd field. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
000110 | rst i=0 | unused (zero) rs2 


31 30 29 25 18 14 13 12 0 


24 19 
000770 





Syntax: 
orn LEGrsir TEGrs2r LEGrg 
orn Yre€G-s1, immediate, reg,g 
Traps: 
(none) 


Condition Code Modified: 


(none) 


Example: 


orn 907 Sy. SOL ' all 1’s except bottom two bits to reg ol 


Instruction Set - Inclusive Or Not 


7-59 





SPARClite User’s Guide 


ORNcc ORNecc 


Inclusive Or Not and modify icc 


SRE GRSR ONE ER SE RS 





Rae 


| Description: 


Implements a bitwise logical inclusive Or Not to compute either “r[7s1] orn r[rs2]” 
if the 7 field is zero, or “r[vs1] orn sign_ext(simm13)” if the i field is one, and places 
the result in the destination specified by the rd field. 


Format: 
31.30 29 25 24 19 18 14.13 «#12 5 4 0 
oro | st [0] unused (2070 2 
31 30 29 25 24 19 18 14 13 #12 0 
10770 
| ° 
| Syntax: 
| OLTCC TEGrsir TIGrsar Ferg 
Ornce YCGrsi, immediate, reg,, 
Traps: 
(none) 


Condition Code Modified: 


nN: Z=0;0;¢=0 


Example: 


oOLrncc BOO: “rab “SOS 
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RDASR RDASR 





Description: 


Reads the contents of the ancillary state register specified by the rs1 field into the 
destination register rd. 


On the SPARClite MB86930 a valid value for rs1 is 17. All other values of rs1 will 
generate an illegal instruction trap. 


All reserved fields should be programmed as 0. RDASR is a privileged 
instruction. 


Format: 


31 30 29 25 12 


24 19 18 14 13 0 
107000 





Syntax: 
0 asY Fe6G2gi7 PSC py 


Traps: 


illegal_instruction 
privileged_instruction 


Condition Code Modified: 


(none) 
Example: 
Lae | Sasrl7, sgl 
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RDPSR 


Read Processor State Register 


ea SAS Mg TAOS 





Description: 


register rd. 








RDPSR 


RDPSR reads the contents of the Processor State Register into the destination 


All reserved fields should be programmed as 0. RDPSR is a privileged instruction. 


Format: 
31.30 29 25 24 19 18 14 13 0 
Syntax: 
rd SOS ly “LEG .5 
Traps: 
privileged_instruction 





Condition Code Modified: 


(none) 
Example: 
rd sper; sol 


Instruction Set - Read Processor State Register 
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RDTBR 


Description: 


RDTBR reads the contents of the Trap Base Register into the destination register 
rd. | 


All reserved fields should be programmed as 0. RDTBR is a privileged 
instruction. 


Format: 


31 30 29 25 12 


24 19 18 14 13 0 
TOv0r 





Syntax: 
rare | eLDiy. Legy4 
Traps: 


privileged_instruction 


Condition Code Modified: 


(none) 
Example: 
a3 | stbr, sgl 
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RDWIM RDWIM 


Read Window Invalid Mask Register 





Description: 


RDWIM reads the contents of the Window Invalid Mask Register into the destina- 
tion register rd. 


All reserved fields should be programmed as 0. RDWIM is a privileged 
instruction. 


Format: 


31. 30 29 25 12 


24 19 18 14 13 0 
TOr0%0 


Syntax: 
| rd Swim, YeGry 
I 
| Traps: 


privileged_instruction 


Condition Code Modified: 


(none) 
Example: 
rd Swim, %g0 


Instruction Set - Read Window Invalid Mask Register 
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RDY RDY 
Read Y Register 


Description: 
RDY reads the contents of the Y register into the destination register rd. 
Unlike the other read state register instructions, RDY is not privileged. All 


reserved fields should be programmed as 0. 


Format: 


31 30 29 25 12 0 


24 19 18 14 13 
707000 00000 


Syntax: 


sel DY LEO es 





Traps: 


(none) 


Condition Code Modified: 


(none) 
Example: 
ae SY, SOO 


Instruction Set - Read Y Register 
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RESTORE RESTORE 


Restore Caller’s Win 


dow 


SHEARS 





Description: | 


The RESTORE instruction adds one (modulo 8) to the Current Window Pointer 
(CWP) of the PSR and compares this value (new_CWP) against the Window 
Invalid Mask (WIM) register. If the WIM bit corresponding to the new_CWP is 0, 
the new_CWP is written into the CWP field of the PSR. This causes the CWP+1 
window to become the current window, thereby restoring the caller’s window. If 
the WIM bit corresponding to the new_CWP is 1, a window_underflow trap is 
generated and the CWP is left unchanged. 


If an overflow trap is not generated, RESTORE behaves like an ADD instruction 
except that the source operands r[rs1] and r[rs2] are read from the old window 
and the sum is written into r[rd] of the new window. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
111101 | _tst___—i{isO| —_—_unused(zero) rs2 


31 30 29 25 18 14.13 12 0 


24 19 
EEL 


Syntax: 
restore LCGrsir LEGrsa, TeGrg 
restore reg,s5;, immediate, reg_,, 
Traps: 


window_underflow 


Condition Code Modified: 


(none) 


Example: 


ret ! return from non-leaf subroutine 

restore $215; S11; 305 ' add number sampled processed with this call 
! to running total kept in callee’s reg id 
! and same register, caller’s reg od. 
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RETT RETT 


Return from Trap Instruction 
RUSE A AT USA NASI NON NN REND NTR ON CS NINN RCE, CT NUON URS 





Description: 


If RETT does not cause a trap, it adds 1 to the CWP (modulo 8), causes a delayed 
control transfer to the target address, restores the S field of the PSR from the PS 
field, and sets the ET field of the PSR to 1. The target address is “r[rs1] + r[rs2]” if 
the 1 field is zero, or “rlrs1] + sign_ext(simm13)” if the 7 field is one. 


RETT can cause one of several traps. In order of highest to lowest priority: 
e If traps are enabled (ET=1) and the processor is in user mode (S=0), a 
privileged_instruction trap occurs. 


e If traps are enabled (ET=1) and the processor is in supervisor mode (S=1), a 
privileged_instruction trap occurs. 


e If traps are disabled (ET=0) and the processor is in user mode (S=0), 
privileged_instruction trap code is placed in tt (trap type) field of TBR and the 
processor enters error_mode state. 





e Iftraps are disabled (ET=0) and a window underflow condition is detected, 
window_underflow trap is placed in tt (trap type) field of TBR and the 
processor enters error_mode state. 

e If traps are disabled (ET=0) and either of the low-order two bits of the target 
address is nonzero, then memory_address_not_aligned code is placed in tt 
(trap type) field of TBR and the processor enters error_mode state. 


The instruction executed immediately before an RETT must be a JMPL instruc- 
tion. 


RETT is a privileged instruction. 


Format: 
31 30 29 25 24 19 18 14 #13 12 5 4 0 
111001 | erst —sfieO{ ~—sreserved rs2 


3130 29 25 18 14 13 12 0 


24 19 
777007 
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Return from Trap 


ENED RN 





Syntax: 
Feet FEGserr TED Keo 
rect LEGrsir immediate 
Traps: 


privileged_instruction 
illegal_instruction 
window_underflow 
mem_address_not_aligned 


Condition Code Modified: 


(none) 


Example: 


To re-execute the trapped instruction when returning from the trap handler use 
the sequence: 


jmp 1 6rl17,%sr0 fold PC 
rett 6r18 old nPC 


To return to the instruction after the trapped instruction (for example, after emu- 
lating an instruction) use the sequence: 


jmp 1 SL i8, or0 lold nPc 
rett Sr18+4 lold nPC + 4 
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SAVE 


Save Caller’s Window 





Description: 


The SAVE instruction subtracts one (modulo 8) from the Current Window Pointer 
(CWP) of the PSR and compares this value (new_CWP) against the Window 
Invalid Mask (WIM) register. If the WIM bit corresponding to the new_CWP is 0, 
the new_CWP is written into the CWP field of the PSR. This causes the CWP -1 
window to become the current window, thereby saving the caller’s window. 
Otherwise a window_overflow trap is generated and the CWP is left unchanged. 


If an overflow trap is not generated, SAVE behaves like an ADD instruction 
except that the source operands r[rs1] and r[rs2] are read from the old window 
and the sum is written into r[rd] of the new window. 





Formak: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
ririoo [si [0] unused (ero) 2 


31 30 29 25 18 14.13 «12 0 


24 19 
d 777700 


Syntax: 
save LEGrsir LEGrs2r TEGrgd 
save YreGysir, immediate, reg-yg 
Traps: 


window_ overflow 


Condition Code Modified: 


(none) 
Example: 
save sp, -64, %Ssp ! equivalent statements to make 
save 506, -64, %06 ! room for 16 more words in call stack 
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SCAN SCAN 


BG OURO A en eo Pa is 





Description: 


The scan instruction returns the location of the first nonsign bit or the location of 
either the most significant one or most significant zero of source register r[rs1]. 


SCAN works as follows: 


(1) The rlvs1] value is “xored” on a bit-wise basis with the value obtained by shift- 
ing right by one bit and sign extending the value in r[rs2]. 


(2) The bit position of the first “1” in the value obtained above is returned to the 
destination register r[rd]. A “1” in the MSB positions returns a value of 0, while 
the first “1” in the LSB position returns a value of 31. If no bit is set, a value of 63 is 
returned. 


See figure 2-25 for additional details 


Format: 
31 30 29 25 24 19 18 14 #13 #12 5 4 0 
101100 | erst si i=] unused (zero) rs2 


31 30 29 25 18 14.13 «#12 0 


24 19 
701700 


Syntax: 
scan LEGrsir LTEGrspr TEIGrg 
scan reG,sj7, immediate, reg ,, 
Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
scan OL; O7- “SGA ! scan reg gl for position of first one 
! from the msb end and put position 
! number in reg g2 
scan SG Ly  sGly. 3GZ ! scan reg gl for position of first bit 


that differs from msb reg gl 
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SETHI SETHI 





Description: 


SETHI zeroes the least significant 10 bits of the destination register (r[rd]), and 
replaces its high-order 22 bits with the value from the immediate field. 


A SETHI instruction with rd=0 and imm22=0 is defined to be a NOP instruction. 


Format: 


31.30 29 25 24 22 21 0 


fo] a | wo [SSS SS—i”T 





Syntax: 
sethi const22, YeGrg 
sethi shi(value), reg,g 
Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
sethi sha (label trig table, ol) 
or S17, tlo(label trig table), s17 ! address pointer of 


[eee She Oe. Ooo 


Instruction Set - Set High 22 bits 
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SLL 


SLL 


Shift Left Logical 





Description: 


SLL shifts the value of r[rs1] left by the count specified by the lower 5 bits of either 
“r[rs2]” if the i field is zero, or “simm13” if the i field is one. The vacated positions 
(least significant bits) are filled with zeroes. The shifted result is placed in the r 
register specified by the rd field. 


Format: 
31 30 29 14 13 12 5 4 0 
Piola [ior |e ([0] unused er) 2 
31 30 29 14 18 12 5 4 0 


Pio] a [voor | at [et] sed pero) [short 


Syntax: 

Sebi TEGrsir TEGrs2r LEGrg 

sll reG,s51, immediate, reg,, 
Traps: 

(none) 


Condition Code Modified: 


(none) 
Example: 

sll ell, tgl,. tol ! left justify least significant part of regll 
! by shift count in reg gl 

sub %g0, Sgl, %gl ! negate reg gl 

sri $11, gl, %00 ! right justify most significant part of reg 11 
! by 32> original shift. count 

ox $00, %o1, %O0 ! join parts to complete left rotate by 
! 


original shift count 
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SMUL | SMUL 


Signed Integer Multiply 





Description: 


SMUL performs either “r[rs1] x r[rs2]” if the i field is zero, or “r[rs1] x 
sign_ext(simm13)” if the 7 field is one. The 32 least significant bits of the product 
are written to the destination register r[rd]. The most significant bits of the prod- 
uct are written to the Y register. 


The SMUL operation takes 5 cycles to compute a 32 bit x word operation, 3 cycles 
to compute a 32 bit x halfword operation, and 2 cycles to compute a 32 bit x byte 
operation. To do this, the hardware tests the most significant 16, 24 or 32 bits of 
r[rs2] against the sign bit at run time. If the bits match, the SMUL instruction will 
terminate in 3, 2 or 1 cycle respectively. 


SMUL assumes a signed integer word operand and computes a signed integer 


doubleword product. Es 
Format: 

31 30 29 25 24 19 18 14 13 ~=~12 5 4 0 

Po] a [wir wi -[-0] _wusedteermy [2 


31 30 29 18 14 13 #12 0 


25 24 19 
ooT0r 


Syntax: 

smul TEGrsir, TE9rs2r TEGrg 

smu L FEC soy: LMMedLalSe;, Leg.4 
Traps: 

(none) 


Condition Code Modified: 


(none) 

Example: 
smul SOZ, SOS, SOL ! least significant half product to %ol 
ra SY, GOO ) MOsSt SsilontEicant. halt product. to~7o00 


Instruction Set - Signed Integer Multiply 
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SMULcc 


SMULcc 


Signed Integer Multiply and Change Condition Codes 





Description: | 
SMULcc performs either “r[rs1] x r[rs2]” if the i field is zero, or “r[rs1] x 
sign_ext(simm13)” if the i field is one. The 32 least significant bits of the product | 
are written to the destination register r[rd]. The most significant bits of the Pee: 

uct are written to the Y register. 


The SMUL operation takes 5 cycles to compute a 32 bit x word operation, 3 cycles 
to compute a 32 bit x halfword operation, and 2 cycles to compute a 32 bit x byte 
operation. To do this, the hardware tests the most significant 16, 24 or 32 bits of 
r[rs2] against the sign bit at run time. If the bits match, the SMUL instruction will 
terminate in 3, 2 or 1 cycle respectively. 


SMULcc assumes a signed integer word operand and computes a signed integer 
doubleword product. SMULcc writes the integer condition code (see below). 


Formas: 
31 30 29 14 13 12 5 4 0 
Piola | oon [ et [0] anased or) 2 


31 30 29 14.13 {2 0 


Pola | oon tet) CS 


Syntax: 
smulcc LEGrsir TEGrsor TeGrg 
smulce. regd,>57, immediate, reg, 


Instruction Set - Signed Integer Multiply and Change Condition Codes 
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Traps: 


(none) 


Condition Code Modified: 


Tiecwt [Stee 


Set if product [31] = 1 
Set if product [31:0] = 0 
Zero 

Zero 









Example: 


smulcc %02, %03, %ol 
rd SY, 6OO0 
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! least significant half product to %ol 
Most. Signatveant halt product. to 2.00 
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SRA SRA 





Description: 


SRA shifts the value of r[rs1] right by the count specified by the lower 5 bits of 
either “r[rs2]” if the i field is zero, or “simm13” if the i field is one. The vacated 
positions (most significant bits) are filled with the most significant bit of r[rs1]. 
The shifted result is placed in the r register specified by the rd field. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
100111 | rst ik. unused (zero) rs2 


31.30 29 25 18 14_ 13 #12 5 4 0 


24 19 
root 


Syntax: 
Sd. TEGrsir TEGrso, TeGrg 
sra COG 297 - LIME aLe;. Leg, 
Traps: 
(none) 


Condition Code Modified: 


(none) 


Example: 


sra gl, 4, gl ! right shift reg gl 4 bits and extend sign 


Instruction Set - Shift Right Arithmetic 
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SRL 





Description: 


SRL shifts the value of r[rs1] right by the count specified by the lower 5 bits of 
either “r[rs2|” if the i field is zero, or “simm13” if the 7 field is one. The vacated 
positions (most significant bits) are filled with zeroes. The shifted result is placed 
in the r register specified by the rd field. 


Format: 
31 30 29 25 24 19 18 14 #13 12 5 4 0 
cs 100110 | rst fi=O] ——_unused(zero) rs2 


31. 30 29 25 18 14 (13 #12 5 4 0 


24 19 
Pio [| 100140 


Syntax: 
Srl YEGrsir TEGrs2r TeEGrd 
sri YCG+513, immediate, redg-y 
Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 

sll S11, %g1, %ol ! left justify least significant part of regll 
! by shift count in reg gl 

sub 90, %tg1, tg1 ! negate reg gl 

srl $11, Sg1l, %00 ! right justify most significant part of reg 1l 
! by 32 - original shift count 

or $00, Sol, %00 ! join parts to complete left rotate by 


original shift count 
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ST 


ST 


Store Word 





Description: 


The ST instruction moves a word from the r register specified by the rd field into 
memory. The effective memory address is either “r[rs1] + rlrs2]” if thei field is 
zero, or “r[rs1] + sign_ext(simm13)” if the i field is one. If the ST instruction traps, 
memory remains unchanged. 


The address space identifier (ASI) indicates either user data (OxA) or supervisor 
data (OxB) according to the S bit of the PSR. 


Formafr: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
0000 | si‘ [m0] unused (zero) rs 


31.30 29 25 18 14.13 #12 0 


24 19 
qooT00 


Syntax: 

st LBeg ues + aS en Ae TEQG rg 

st [260.47 77>. 1mmedi ate], .regua 
Traps: 


mem_address_not_aligned 
data_access_exception 


Condition Code Modified: 


(none) 
Example: 
La [sgO + OxfeO], 314 
ike [OxfeO], S14 ! recognized as equivalent 
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STA 


Description: 


The STA instruction moves a word from the r register specified by the rd field into 
memory. The source value is stored to “r[rs1] + r[rs2]” with the ASI field designat- 
ing the ASI value. If the STA instruction traps, memory remains unchanged. STA 
is privileged and may only be executed in supervisor mode. 


Format: 
31 30 29 25 24 19 18 14 13 #12 5 4 0 
Pi] a | oro —«f i__—o AS 2 
Syntax: 
sta [VeGues oo PSC esl hole, egy 
Traps: 


mem_address_not_aligned 
data_access_exception 

illegal_instruction (if i=1) 
privileged_instruction (if not supervisor mode) 


Condition Code Modified: 


(none) 
Example: 
sta [Sll.+ $12] 0xft, s14 1 AST value 15 decimal 


Instruction Set - Store Word in Alternate Space 
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STB STB 





Description: 


The STB instruction moves the least significant byte from the r register specified 
by the rd field into memory. The effective memory address is either “r[rs1] + 
r[rs2]” if the i field is zero, or “r[rs1] + sign_ext(simm13)” if the 7 field is one. If the 
STB instruction traps, memory remains unchanged. 


The address space identifier (ASI) indicates either user data (OxA) or supervisor 
data (OxB) according to the S bit of the PSR. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
000101 | rst {isO] ——_unused(zero) rs2 


31 30 29 25 18 14°13 =«#12 0 


24 19 
000107 


Syntax: 

stb [reg,57 ae reGrsal, regrd 

stb [reg,,, +/- immediate], reg-_, 
Traps: 


data_access_exception 


Condition Code Modified: 


(none) 
Example: 
sth [$i5 + $12], %g2 
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STBA 


Description: 


The STBA instruction moves the least significant byte from the r register specified 
by the rd field into memory. The source value is stored to “r[rs1] + r[rs2]” with the 
ASI field designating the ASI value. If the STBA instruction traps, memory 
remains unchanged. STBA is privileged and may only be executed in supervisor 
mode. 


Format: 


31 30 29 25 18 14 13 12 5 4 0 


24 19 
ois et «RO oS @ 


Syntax: 


stbha [eG yaar PSG seo ASt,: Lege. 


Traps: 
data_access_exception 
illegal instruction (if i=1) 
privileged_instruction (if not supervisor mode) 


Condition Code Modified: 


(none) 


Example: 


stbha Loot: DLOx ly O04 
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STH STH 


BRIS REISE GSS RSG AIR SS SSSR TRS 





Description: 


The STH instruction moves the least significant halfword from the r register spec- 
ified by the rd field into memory. The effective memory address is either “r[rs1] + 
r[rs2]” if thei field is zero, or “r[rs1] + sign_ext(simm13)” if the i field is one. If the 
STH instruction traps, memory remains unchanged. 


The address space identifier (ASD indicates either user data (OxA) or supervisor 
data (OxB) according to the S bit of the PSR. 


Forma?: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
000110 | rst —s{i=0] ——_unused(zero) rs2 


31 30 29 25 18 14.13 «12 0 


24 19 
000770 


Syntax: 

sth [BEG pag eg 2.5), LEG xs 

sth [reg,,,; +/- immediate], reg,, 
Traps: 


data_access_exception 
mem_address_not_aligned 


Condition Code Modified: 


(none) 
Example: 
sth [sg0 + OxfeQ], %14 


Instruction Set - Store Halfword 
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STHA 
Store Halfword in Alternate Space 


Description: 


The STHA instruction moves the least significant byte from the r register specified 
by the rd field into memory. The source value is stored to “r[rs1] + rlrs2]” with the 
ASI field designating the ASI value. If the STHA instruction traps, memory 
remains unchanged. STHA is privileged and may only be executed in supervisor 
mode. 


Format: 


31 30 29 25 24 19 18 14. 13 #12 5 4 0 
o1o110 | st x0] ASI 182 
Syntax: 


stha [PCG ver. LEG ee) Rody reg. 


Traps: 
data_access_exception 
illegal_instruction (if i=1) 
mem_address_not_aligned 
privileged_instruction (if not supervisor mode) 


Condition Code Modified: 


(none) 
Example: 
stha [312 + %13]0x3, %i4 


Instruction Set - Store Halfword in Alternate Space 
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STD 


STD 


Store Doubleword into Alternate space 





Description: 


The STD instruction moves a doubleword from an even/next-odd r register pair 
into memory. The even r register (which contains the most significant word) is 
written into memory at the effective address and the odd r register (with the least 
significant word) is written into memory at the effective address + 4. The effective 
memory address is either “r[rs1] + rlrs2]” if the i field is zero, or “r[rs1] + 
sign_ext(simm13)” if the 7 field is one. 


The address space identifier (ASI) indicates either user data (0xA) or supervisor 
data (OxB) according to the S bit of the PSR. 


If the STD instruction traps while writing the first word to memory, memory 
remains unchanged. If the STD instruction traps while the second word is being 
written, the first word written (the most significant word at the highest address) 
will have been changed. 


Forma?t: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
ooort | st‘ [r0] unused ero 2 


31.30 29 25 18 14,13 ~«12 0 


24 19 
ce ee ee 


Syntax: 

std [regys7 + Ledreoly LTeGyg 

std [reg,.,; +/- immediate], reg_, 
Traps: 


data_access_exception 
mem_address_not_aligned 


Condition Code Modified: 


(none) 
Example: 
std eS SoA] 4 “S02 
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STDA STDA 


St Doubleword in 
OLS NODES SEE SEE ID. BY SOE ESSE DSS ONS SLES SSS SSE IEEE 





Description: 


The STDA instruction moves a doubleword from an even/next-odd r register pair 
into memory. The even r register (which contains the most significant word) is 
written into memory at the effective address and the odd r register (with the least 
significant word) is written into memory at the effective address + 4. The source 
value is stored to “r[rs1] + r[rs2]” with the ASI field designating the ASI value. 
STDA is privileged and may only be executed in supervisor mode. 


If the STD instruction traps while writing the first word to memory, memory 
remains unchanged. If the STD instruction traps while the second word is being 
written, the first word written (the most significant word at the highest address) 
will have been changed. 





Format: 
3130 29 25 24 19 18 1413 12 54 0 
Pai [| om | et [0 ASI 2 
Syntax: 
stda [Peg sagt Lede lRoly, LEG. 
Traps: 


data_access_exception 

illegal_instruction (if i=1) 
mem_address_not_aligned 
privileged_instruction (if not supervisor mode) 


Condition Code Modified: 


(none) 
Example: 
stda (ele el3 Oks, 214 
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SUB | SUB 


Subtract 


EES 





Description: 


Computes either “r[rs1]-r[rs2]” if the i field is zero, or “r[rs1] - sign_ext(simm13)” 
if the i field is one, and places the result in the destination specified by the rd field. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
Piola [00100 | i —«(-0] unused wero) «| 
31 30 29 25 24 19 18 14. 13 #12 0 
! Syntax: 
| sub LEGrsir LEGrsor LEGrg 
| sub reg,-o7, immediate, reg,, 
Traps: 
(none) 
Condition Code Modified: 
(none) 
Example: 
mov 4, S11 
mov Zig  OL2 
sub Fit. S12, Si Ie a= YD 
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Subtract 
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SUBcc 


Description: 


Computes either “r[rs1]-rlrs2]” if the i field is zero, or “r[rs1] - sign_ext(simm13)” 
if the i field is one, and places the result in the destination specified by the rd field. 


SUBcc modifies the integer condition codes. Overflow occurs on subtraction if the 
operands have different signs and the sign of the difference differs from the sign 
of r[rs1]. 


Format: 
31 30 29 25 24 19 18 14 13 #12 5 4 0 
PT a | oo | mi [ro]  unised@oo) | a2 
31 30 29 25 24 19 18 14 13 12 0 
Syntax: 
subcc eG eat: LEG peor LEC 
subcc Peg waz, LimedilaLe, Leggy. 
Traps: 
(none) 
Condition Code Modified: 
WZ O.e 
Example: 
mov 4, S11 
subcc Oe deg: OZ: “SS ! $13= 2 
! nzve = 0000 
subcec Bik y- OCT y- Gl4A [ $14 = -3 
I 


! nzvce = 1001 
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SUBX 


SUBX 


Subtract with Carry 
ESS SSS SCI SONS 





Description: 


Computes either “r[7s1]-r[rs2]-c” if the i field is zero, or “r[rs1] - 
sign_ext(simm13)-c” if the i field is one, and places the result in the destination 
specified by the rd field. 


Format: 


14.13 #12 5 4 0 


31 30 29 25 24 19 18 
oorico [i —‘([r0] unused (wera) |e 


31 30 29 25 24 19 18 14.13 #12 0 
001100 
Syntax: 


subx LEGrsir TEGrspr TeGrg 
Ssubx PreGeazr Immediate, Legyy 


Traps: 
(none) 

Condition Code Modified: 
(none) 


Example: 


subcc SGU; 255; 3G3 ! reg g3 = -255, nzvc = 1001 
subx %g0, 0, %g2 ! reg g2 = -l1, sign extended 


Instruction Set - Subtract with Carry 
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SUB Xcc SUBXcc 
Subtract and modify icc 


Description: 


Computes either “r[rs1]-r[rs2]-c” if the 7 field is zero, or “r[rs1] - 
sign_ext(simm13)-c” if the 7 field is one, and places the result in the destination 
specified by the rd field. 


SUBXcc modifies the integer condition codes. Overflow occurs on subtraction if 
the operands have different signs and the sign of the difference differs from the 
sign of r[rs1]. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
ot00 [| ri [x0] unused (zero z 


3130 29 25 18 14 (13 =#12 0 


24 19 
ori T00 





Syntax: 
subxcc LEGrsir, TCGrsor LeGrg 
subxcec PSO pai}: UVMMeCOlate, Tege5 
Traps: 
(none) 


Condition Code Modified: 


NZ 0<€ 
Example: 
mov ly. oh ' reg 11 = Oxfftffftftt 
Sr. Sled low Souler ! reg 12 = Ox7££ffFLfEF 
Orce 6g0, O, %g0 ! nave = 0100 
SubxXGe. “siz, shi, sg ! reg gl = Ox80000000, nzvc = 1011 
Subxce “S12, slip -sg2Z reg “G2. = UR /fT rir ii, nzve=-0001 


Instruction Set - Subtract and modify icc 
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SWAP 





SWAP 


SWAP Register with Memory 
ssa SSS DSBS aa 





Description: 


The SWAP instruction exchanges the contents of the r register identified by the rd 
field with the contents of the addressed memory location. This is performed 
atomically without allowing intervening asynchronous traps. 


The effective address of the swap instruction is either “r[rs1] + rlrs2]” if the i field 
is zero, or “r[rs1] + sign_ext(simm13)” if the 7 field is one. 


If the SWAP instruction traps, memory remains unchanged. 


Format: 
31 30 29 25 24 19 18 14 13 #12 5 4 0 
001111 | rst ie unused (zero) rs2 


31 30 29 25 18 14,113 ~12 0 


24 19 
oor 


Syntax: 
Swap [regrs,7 + Le€Gree], LCGrg 
Swap [req ..7-* 2mmedracte] 7). 2ega. 
Traps: 


data_access_exception 
mem_address_not_aligned 


Condition Code Modified: 


(none) 
Example: | 
Swap [sg7-23], %Sg6 | 


Instruction Set - SWAP Register with Memory 
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SWAPA SWAPA 


SWAP Register with Alternate Space Memory 





Description: 


The SWAPA instruction exchanges the r register identified by the rd field with the 
contents of the addressed memory location. This is performed atomically without 
allowing intervening asynchronous traps. 


The effective address of the swap instruction is “r[rs1] + r[rs2]” with the ASI field 
designating the ASI value. 


If the SWAPA instruction traps, memory remains unchanged. SWAPA is privi- 
leged and may only be executed in supervisor mode. 





Format: 
31 30 29 25 24 19 18 14 13 #12 5 4 0 
[af | ont | i‘ AS wz 
Syntax: 
swapa DEG as) Sr eG ous oy. eG 
Traps: 


data_access_exception 

illegal_instruction (if i=1) 
mem_address_not_aligned 
privileged_instruction (if not supervisor mode) 


Condition Code Modified: 


(none) 


Example: 


Swapa [3i5 + 1L25]oxt, 14 


Instruction Set - SWAP Register with Alternate Space Memory 
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TA TA 





Description: 


The TA instruction generates a trap_instruction trap if no higher priority traps are 
pending. The trap_instruction trap causes the tt field of the Trap Base Register 
(TBR) to be written with 128 plus the least significant seven bits of either “r[rs1] + 
r[rs2]” if the 7 field is zero, or “r[rs1] + sign_ext(simm13)” if the 7 field is one. 


All bits indicated as reserved in the instruction formats should be supplied as 
zero as should the most significant 25 bits of r[rs2] if the i field is 0. 


(note: If single vector trapping is enabled, the trap_instruction trap will vector to 
the location pointed to by the Trap Base Address in the TBR, and the tt field will 


: be ignored) 
| 
Format: 
31 30 29 28 25 24 19 18 14 13 ~12 5 4 0 
[io [reseed] 000 | tivo «| wi —~[-0] _resened—«dYSSCm 


31 30 29 28 18 14.13 ~=—O12 7 6 


25 24 19 8) 
‘ooo [ __i17070 sofware Wap # 


Syntax: 

Le FeO Asi Led ee 

ta Tegze77. 1mMedrate 
Traps: 


| trap_instruction 


Condition Code Modified: 


(none) 


Example: 


ta 2035 ! tt=163 


Instruction Set - Trap Always (Trap on Zero) 
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TADDcc TADDcc 





Description: 


The TADDcc instruction computes either “r[rs1] + rlrs2]” if the 7 field is zero, or 
“r[rs1] + sign_ext(simm13)” if the 7 field is one. An overflow condition exists if bit 
1 or 0 of either operand is not zero, or if the addition generates an arithmetic over- 
flow. 


If TADDcc causes an overflow condition, the overflow bit (v) of the PSR is set; if it 
does not cause an overflow, the overflow bit is cleared. In either case, the remain- 
ing integer condition codes are also updated and the result of the addition is writ- 
ten into the r register specified by the rd field. 





Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
100000 | rst fix unused (zero) rs2 


31 30 29 25 18 14.13 12 0 


24 19 
700000 


Syntax: 


Ladagccey WeGuery- Pe ear: 1 So25 
EedgCGCUV -LeGger, ammmedrate,. ted es 


Traps: 


(none) 


Condition Code Modified: 


fi Z, 07C 
Example: 
teddec: ~sq0; 1, 300 ! nzve = 0010 


Instruction Set - Tagged Add and modify icc 
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TADDccTV TADDccTV 


Tagged Add and modify icc and Trap on Overflow 


SEER 





Banas 


Description: 


The TADDccTV instruction computes either “r[rs1] + r[rs2]” if the 7 field is zero, 
or “r[rs1] + sign_ext(simm13” if the 7 field is one. An overflow condition exists if 
bit 1 or 0 of either operand is not zero, or if the addition generates an arithmetic 
overflow. 


If TADDccTV causes an overflow condition, a tag_overflow trap is generated and 
the destination register and condition codes remain unchanged. If TADDccTV 
does not cause an overflow condition, all the integer condition codes are updated 
(in particular, the overflow bit (v) is set to 0) and the result of the addition is writ- 
ten into the r register specified by the rd field. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
ooor0 «| ei_—*([-0] unused (eer @ 


3130 29 25 14.13 ~12 0 


24 19 18 
Pio] a | 100070 


Syntax: 


taddcectv YreGrsir LEGrsor LEGrg 
taddcctv reg,.;, immediate, reg,, 


Traps: 


tag_overflow 


Condition Code Modified: 


7:0; € 
Example: 
taddecty ~20q0, 1,- 200 ! nzvc=0010 


Instruction Set - Tagged Add and modify icc and Trap on Overflow 
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TCC 





Description: 


The TCC instruction causes a trap_instruction trap if (not C)=1 and if no higher 

priority trap is pending. The trap_instruction trap causes the tt field of the Trap 

Base Register (TBR) to be written with 128 plus the least significant seven bits of 
either “r[rs1] + r[rs2]” if the 7 field is zero, or “r[rs1] + sign_ext(simm13)” if the i 

field is one. 


If (not C)=0, a trap_instruction trap does not occur and the instruction behaves 
like a NOP. All bits indicated as reserved in the instruction formats should be 
supplied as zero as should the most significant 25 bits of r[rs2] if the 7 field is 0. 


Note: if single vector trapping is enabled, the trap_instruction trap will vector to 
the location pointed to by the Trap Base Address in the TBR and the tt field will 


be ignored) be 
Format: 

31 30 29 28 25 24 19 18 14°13 #12 5 4 0 

reserved | 1101 141010 | rst fie] unused(zero) |] s2 


31 30 29 28 25 18 14 13 =#+12 7 6 


24 19 0 
riot [ 117070 sofware Wap # 


Syntax: 
tcc reGrsir F&9rs2 
aie Leg sa7;). ammedrate 
Cgeu LCG esir. LEG reo falternate mnemonic 
tgeu veG very immediate falternate mnemonic 
Traps: 


trap_instruction 


Condition Code Modified: 


(none) 
Example: 
tec sg0 + 33 ! tt = 161 


Instruction Set - Trap on Carry Clear (Trap on Greater Than or Equal, Unsigned) 
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TCS 


TCS 


Trap 0 on n Carry | Sel (Trap o on Less Than, Unsigned) 





Description: 


The TCS instruction causes a trap_instruction trap if C=1 and if no higher priority 
trap is pending. The trap_instruction trap causes the tt field of the Trap Base Reg- 
ister (TBR) to be written with 128 plus the least significant seven bits of either 
“r[rs1] + r[rs2]” if the i field is zero, or “r[rs1] + sign_ext(simm13)” if the 7 field is 
one. 


If C=0, a trap_instruction trap does not occur and the instruction behaves like a 
NOP. All bits indicated as reserved in the instruction formats should be supplied 
as zero as should the most significant 25 bits of r[rs2] if the 7 field is 0. 


(note: If single vector trapping is enabled, the trap_instruction trap will vector to 
the location pointed to by the Trap Base Address in the TBR, and the tt field will 
be ignored) 


Format: 
31 30 29 28 14. #13 12 5 4 @] 
0101 111010 oan ee ae rs2 
31 30 29 28 14 #13 12 0 


cir] iioio [ett] reserved software rap ® 


Syntax: 
tcs EEGs ey LEG 35 
tes reg,s5z7, immediate 
Clu POG ncey, ESOL 25 ! alternate mnemonic 
pan Bo Yeg,.1, immediate ! alternate mnemonic 
Traps: 


trap_instruction 


Condition Code Modified: 


(none) 


Example: 


tcs 6g0 + 34 ! tt = 162 


Instruction Set - Trap on Carry Set (Trap on Less Than, Unsigned) 
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TE 


Trap on Equal 


Description: 


The TE instruction causes a trap_instruction trap if Z=1 and if no higher priority 
trap is pending. The trap_instruction trap causes the tt field of the Trap Base Reg- 
ister (TBR) to be written with 128 plus the least significant seven bits of either 
“r[rs1] + rlrs2]” if the 7 field is zero, or “r[rs1] + sign_ext(simm13)” if the 7 field is 
one. 


If Z=0, a trap_instruction trap does not occur and the instruction behaves like a 
NOP. All bits indicated as reserved in the instruction formats should be supplied 
as zero as should the most significant 25 bits of rlrs2] if the 7 field is 0. 


(note: If single vector trapping is enabled, the trap_instruction trap will vector to 
the location pointed to by the Trap Base Address in the TBR, and the tt field will 
be ignored) 


Format: 
31 30 29 28 25 24 19 18 14 13 12 5 4 0 
coor] ior «[ mt —*[e0] resoned =i? 


31 30 29 18 14 13 «+12 7 6 


28 25 24 19 0 
wend] 007 117010 sofware Wap # 


Syntax: 

Ie EEG peiy: “LeG eso 

te reG,so1, immediate 
Traps: 


trap_instruction 


Condition Code Modified: 


(none) 


Example: 


te 500+: 36 ' tt = 164 


Instruction Set - Trap on Equal 
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TG 


TG 


Trap on Greater 


SEESLEE SS ESD MONE EE A EST EEE ON OES OE 





Description: 


The TG instruction causes a trap_instruction trap if “not(Z or (N xor V))” is true 
and if no higher priority trap is pending. The trap_instruction trap causes the tt 
field of the Trap Base Register (TBR) to be written with 128 plus the least signifi- 
cant seven bits of either “r[rs1] + r[rs2]” if the i field is zero, or “r[rs1] + 
sign_ext(simm13)” if the i field is one. 


If “not (Z or (N xor V))” is false, a trap_instruction trap does not occur and the 
instruction behaves like a NOP. All bits indicated as reserved in the instruction 
formats should be supplied as zero as should the most significant 25 bits of r[rs2] 
if the 7 field is 0. 


(note: If single vector trapping is enabled, the trap_instruction trap will vector to 
the location pointed to by the Trap Base Address in the TBR, and the tt field will 
be ignored) 


Formart: 
31 30 29 28 25 24 19 18 14 #13 #12 5 4 0 
oro [1010 | si_—*([0] _resoned —s« 


31. 30 29 18 14.13 ~«#12 7 6 0 


28 25 24 19 
[70 [reserved] i010 | 117070 sofware tap # 


Syntax: 

tg Fe9rsir Te9rs2 

tg reg,s517, immediate 
Traps: 


trap_instruction 


Condition Code Modified: 


(none) 


Example: 


tg %g0+36 ! tt=164 


Instruction Set - Trap on Greater 
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TGE TGE 


Trap o1 on 1 Greater Than ¢ or r Equal 





Description: 


The TGE instruction causes a trap_instruction trap if “not(N xor V)” is true and if 
no higher priority trap is pending. The trap_instruction trap causes the tt field of 
the Trap Base Register (TBR) to be written with 128 plus the least significant seven 
bits of either “r[rs1] + r[rs2]” if the i field is zero, or “r[rs1] + sign_ext(simm13)” if 
the 7 field is one. 


If “not(N xor V)” is false, a trap_instruction trap does not occur and the instruc- 

tion behaves like a NOP. All bits indicated as reserved in the instruction formats 
should be supplied as zero as should the most significant 25 bits of r[rs2] if thei 

field is 0. 


(note: If single vector trapping is enabled, the trap_instruction trap will vector to 
the location pointed to by the Trap Base Address in the TBR, and the tt field will 





be ignored) 

Format: 
31 30 29 14.113 «+12 5 4 0 
Sa @ 
31_ 30 29 28 14.13 «#12 0 


ion] i010 (| et ‘[et] reseed | sofware apy 


Syntax: 
Ege FEGu.25,. YeG.25 
ge reQ,s1, immediate 
Traps: 


trap_instruction 


Condition Code Modified: 


(none) 
Example: 
tge $g0+37 ! tt=105 


Instruction Set - Trap on Greater Than or Equal 
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TGU 


TGU 





Description: 


The TGU instruction causes a trap_instruction trap if “not (C or Z)” is true and if 
no higher priority trap is pending. The trap_instruction trap causes the tt field of 
the Trap Base Register (TBR) to be written with 128 plus the least significant seven 
bits of either “r[rs1] + r[rs2]” if the 7 field is zero, or “r[rs1] + sign_ext(simm13)” if 
the i field is one. 


If “not (C or Z)” is false, a trap_instruction trap does not occur and the instruction 
behaves like a NOP. All bits indicated as reserved in the instruction formats 
should be supplied as zero as should the most significant 25 bits of r[rs2] if the i 
field is 0. 


(note: If single vector trapping is enabled, the trap_instruction trap will vector to 
the location pointed to by the Trap Base Address in the TBR, and the tt field will 
be ignored) 


Format: 
31 30 29 28 25 24 19 18 14 #13 #12 5 4 0 
TL Cd 


3130 18 14.113 #12 76 


29 28 25 24 19 0 
roo [117070 sofware Wap # 


Syntax: 
Lgu ECO .477- 2eGys5 
tgu reg +5], immediate 
Traps: 


trap_instruction 


Condition Code Modified: 


(none) 


Example: 


tgu $g0+38 ! tt=1606 


Instruction Set - Trap on Greater Unsigned 


7-100 


TL 





co 
FUJITSU 


TL 


Trap on Less 





Description: 


The TL instruction causes a trap_instruction trap if “N xor V” is true and if no 
higher priority trap is pending. The trap_instruction trap causes the tt field of the 
Trap Base Register (TBR) to be written with 128 plus the least significant seven 
bits of either “r[rs1] + r[rs2]” if the i field is zero, or “r[rs1] + sign_ext(simm13)” if 
the 7 field is one. 


If “N xor V” is false, a trap_instruction trap does not occur and the instruction 
behaves like a NOP. All bits indicated as reserved in the instruction formats 
should be supplied as zero as should the most significant 25 bits of r[rs2] if the i 
field is 0. 


(note: If single vector trapping is enabled, the trap_instruction trap will vector to 
the location pointed to by the Trap Base Address in the TBR, and the tt field will 
be ignored) 


Formak: 
31 30 29 28 25 24 19 18 14 #13 12 5 4 0 
[ 10 [reserved] 0071 toro [st [eo] reseved «Ys 


31 30 18 14.13 «12 7 6 


29 28 25 24 19 0 
wor [17070 sofware Wap 4 


Syntax: 

eae CEG p27 - TOG H39 

ea reg,s7, immediate 
Traps: 


trap_instruction 


Condition Code Modified: 


(none) 


Example: 


tl gO + 40 | tt=168 


Instruction Set - Trap on Less 
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TLE 


TLE 


LEONG SLEDS STEN OREN RUE ONO E OOTLERE NOOSE RIED 





Description: 


The TLE instruction causes a trap_instruction trap if “Z or (N xor V)” is true and if 
no higher priority trap is pending. The trap_instruction trap causes the tt field of 
the Trap Base Register (TBR) to be written with 128 plus the least significant seven 
bits of either “r[rs1] + r[rs2]” if the i field is zero, or “r[rs1] + sign_ext(simm13)” if 
the i field is one. 


If “Z or (N xor V)” is false, a trap_instruction trap does not occur and the instruc- 
tion behaves like a NOP. All bits indicated as reserved in the instruction formats 
should be supplied as zero as should the most significant 25 bits of r[rs2] if the 1 
field is 0. 


(note: If single vector trapping is enabled, the trap_instruction trap will vector to 
the location pointed to by the Trap Base Address in the TBR, and the tt field will 
be ignored) 


Format: 
3130 29 28 25 24 19 18 14 (13 #12 5 4 0 
[10 [reseed] 000 | tio10 | i'r] resored «| —rs2 
31_ 30 29 28 25 24 19 18 14 #13 12 7 6 0 
[ 10 [reserved] 0070 
Syntax: 
tie LOG 4.77. LEU pea 
ibe LEGrs51,_ Immediate 
Traps: 
trap_instruction 
Condition Code Modified: 
(none) | 
Example: 
tle gO + 41 b Ste = 169 


Instruction Set - Trap on Less Than or Equal 
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TLEU TLEU 


Trap on Less Tha or Eq | Unsigned 
EES os AEGINA I BNET ES OSIM OL OR ARLE EE CCR ARRE DEE OOO T TS ERD BEE DS 





Description: 


The u instruction causes a trap_instruction trap if “C or Z” is true and if no higher 
priority trap is pending. The trap_instruction trap causes the tt field of the Trap 
Base Register (TBR) to be written with 128 plus the least significant seven bits of 
either “r[rs1] + r[rs2]” if the i field is zero, or “r[rs1] + sign_ext(simm13)” if the i 
field is one. 


If “C or Z” is false, a trap_instruction trap does not occur and the instruction 
behaves like a NOP. All bits indicated as reserved in the instruction formats 
should be supplied as zero as should the most significant 25 bits of r[rs2] if the i 
field is 0. 


(note: If single vector trapping is enabled, the trap_instruction trap will vector to 
the location pointed to by the Trap Base Address in the TBR, and the tt field will 





be ignored) 

Format: 
31 30 29 28 25 24 19 18 14. (13 «#12 5 4 0 
[10 [reserved] 0100 | iviio | wi -‘[ho] rewned «Yi 


31 30 18 14.13 12 76 


29 28 25 24 19 0 
ico | __i17070 sofware Wap 4 


Syntax: 
tleu TCGrsir YEG rs2 
tleu reg,s57, immediate 
Traps: 


trap_instruction 


Condition Code Modified: 


(none) 


Example: 


tleu 6GO0+42 ' tt =170 


Instruction Set - Trap on Less Than or Equal Unsigned 
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TN TN 





Description: 
The TN instruction acts like a “NOP”. 


All bits indicated as reserved in the instruction formats should be supplied as 
zero as should the most significant 25 bits of r[rs2] if the 7 field is 0. 


Format: 
31 30 29 28 25 24 19 18 14 13 #12 5 4 0 
| 10 | reserved | 0000 111010 | erst —sjieO] ~—reserved rs2 


3130 29 18 14°13 ~«2112 7 6 0 


28 25 24 19 
weerved] 0000] 111070 sofware tap # 


Syntax: 

tn LEQ po7y LEG, 25 

tn re€9,s51,, immediate 
Traps: 


trap_instruction 


Condition Code Modified: 


(none) 
Example: 
she $g0 + 39 ! nop 


Instruction Set - Trap Never 
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TNE 


Trap on Not Equal (Trap on Not Zero) 


Description: 


The TNE instruction causes a trap_instruction trap if Z=0 and if no higher priority 
trap is pending. The trap_instruction trap causes the ¢t field of the Trap Base Reg- 
ister (TBR) to be written with 128 plus the least significant seven bits of either 
“rirs1] + r[rs2]” if the 7 field is zero, or “r[rs1] + sign_ext(simm13)” if the 7 field is 
one. 


If Z=1, a trap_instruction trap does not occur and the instruction behaves like a 
NOP. All bits indicated as reserved in the instruction formats should be supplied 
as zero as should the most significant 25 bits of r[rs2] if the 7 field is 0. 


(note: If single vector trapping is enabled, the trap_instruction trap will vector to 
the location pointed to by the Trap Base Address in the TBR, and the tt field will 
be ignored) 


Format: 
31 30 29 28 25 24 19 18 14 143 12 5 4 0 
1001 moro st [eo] eseved «ws 
31 30 29 28 25 18 14 13 12 7 6 0 


24 19 


Syntax: 
LHe BOO pegy: ess 
tne reg,s7;, immediate 
CnzZ CeO waty: LECT we> 
Enz reo p34; LmMed Lace 
Traps: 


trap_instruction 


Condition Code Modified: 


(none) 
Example: 
tne gO + 43 ltt=171 


Instruction Set - Trap on Not Equal (Trap on Not Zero) 
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TNEG 


ive 


Bes 


Rates Ra 








Trap on Negat 


Description: | 


The TNEG instruction causes a trap_instruction trap if N=1 and if no higher prior- 
ity trap is pending. The trap_instruction trap causes the tt field of the Trap Base 
Register (TBR) to be written with 128 plus the least significant seven bits of either 
“r[rs1] + r[rs2]” if the i field is zero, or “r[rs1] + sign_ext(simm13)” if the i field is 


one. 


If N=0, a trap_instruction trap does not occur and the instruction behaves like a 
NOP. All bits indicated as reserved in the instruction formats should be supplied 
as zero as should the most significant 25 bits of rlrs2] if the 7 field is 0. 


(note: If single vector trapping is enabled, the trap_instruction trap will vector to 
the location pointed to by the Trap Base Address in the TBR, and the tt field will 


be ignored) 


Format?: 


28 14.13 #12 5 4 0 


31 30 29 25 24 19 18 
ono] io | wi ‘[eo] _vesened—«dYS Sw 


29 18 14.13 +12 7_ 6 


31 30 28 25 24 19 0 
[io [reserved] 0170 | 11070 sofware Wap # 


Syntax: 


Ene? ESQ e979 eG ues 
tneg re€G+s51, immediate 


Traps: 


trap_instruction 


Condition Code Modified: 


(none) 
Example: 
tneg $g0 + 44 Pett = 172 


Instruction Set - Trap on Negative 
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T P e ti 
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Description: 


The TPOS instruction causes a trap_instruction trap if N=0 and if no higher prior- 
ity trap is pending. The trap_instruction trap causes the tt field of the Trap Base 
Register (TBR) to be written with 128 plus the least significant seven bits of either 
“rirs1] + r[rs2]” if the i field is zero, or “r[rs1] + sign_ext(simm13)” if the i field is 
one. 

If N=1, a trap_instruction trap does not occur and the instruction behaves like a 


NOP. All bits indicated as reserved in the instruction formats should be supplied 
as zero as should the most significant 25 bits of r[rs2] if the 7 field is 0. 


(note: If single vector trapping is enabled, the trap_instruction trap will vector to 
the location pointed to by the Trap Base Address in the TBR, and the tt field will 





be ignored) 
Format: 
31.30 29 28 25 24 19 18 14. 13 #12 5.4 0 
[10 [reseved] 110 [ 100 | i __‘[H0] vesened «Tw 
31 30 29 28 25 24 19 18 14.:(13 #12 76 0 
Syntax: 
tpos ECG gare. FCO ps5 
tpos reg,s5,7, immediate 
Traps: 
trap_instruction 
Condition Code Modified: 
(none) 
Example: 
tpos 6g0 + 45 PC: Se 


Instruction Set - Trap on Positive 
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TSUBcc 


Tagged Subt 





Description: 


Computes either “r[rs1]-r[rs2]” if the i field is zero, or “r[rs1] - sign_ext(simm13)” 
if the i field is one, and places the result in the destination specified by the rd field. 


TSUBcc modifies the condition codes. The overflow bit of the PSR is set if bit 1 or 
bit 0 of either operand is nonzero. The overflow bit is also set if the operands have 
different signs and the sign of the difference differs from the sign of r[rs1]. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
ooo; | si [0] unused (er) @ 


31.30 29 25 18 14.13 «12 0 


24 19 
700001 


Syntax: 
tsubcec TEGrsir TEGrs2r Tord 
tsubecc FeGu.jy ammediate, regs, 
Traps: 
(none) 


Condition Code Modified: 


(ie a be 6 


Example: 


tsubcc 6g0, 2, %SgO ! nzve = 1011 
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TSUBccTV TSUBccTV 


Tagged Subtract, modify condition codes and Trap on Overflow 





Description: 


Computes either “r[rs1]-r[rs2]” if the 1 field is zero, or “r[vs1] - sign_ext(simm13)” 
if the i field is one, and places the result in the destination specified by the rd field. 


A tag_overflow occurs if bit 1 or bit 0 of either operand is nonzero, or if the sub- 
traction generates an arithmetic overflow (the operands have different signs and 
the sign of the difference differs from the sign of r[rs1]). 


If TSUBccTV causes a tag_overflow, a tag_overflow trap is generated and the des- 
tination register (rd) and condition codes remain unchanged. If a tag_overflow 
does not occur, the integer condition codes are updated (v=0). 





Format: 
31 30 29 14 13 12 5 4 0 
Pio] od 100011 a ee ol unused (zero) rs2 
31 30 29 14 13 12 0 


PC A 


Syntax: 


CSsupcctv YTeGr.ir LeGreor LEG xq 
tsubcctv Yregre7, immediate, reg,, 


Traps: 


tag_overflow 


Condition Code Modified: 


nN, Z,V,C 


Example: 


[e) 


tsubectyv %90;, 2, sg0 e nzve: = Loi 
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Tvc 


Tvc 


Trap on Overflow Clear 


ESLER GE BET SNS ORE Sve OEE ON RUDE oo BSE 





Description: 


The TVC instruction causes a trap_instruction trap if V=0 and if no higher priority 
trap is pending. The trap_instruction trap causes the tt field of the Trap Base Reg- 
ister (TBR) to be written with 128 plus the least significant seven bits of either 
“r[rs1] + r[rs2]” if the i field is zero, or “r[rs1] + sign_ext(simm13)” if the i field is 
one. 


If V=1, a trap_instruction trap does not occur and the instruction behaves like a 
NOP. All bits indicated as reserved in the instruction formats should be supplied 
as zero as should the most significant 25 bits of r[rs2] if the 7 field is 0. 


(note: If single vector trapping is enabled, the trap_instruction trap will vector to 
the location pointed to by the Trap Base Address in the TBR, and the tt field will 
be ignored) 


Forma?t: 
31 30 29 28 25 24 19 18 14 13 12 5 4 0 
1111 111010 | rst —s fic] ~—steserved rs2 


31 30 18 14 13° «#12 7 6 


29 28 25 24 19 0 
| i070 sofware Wap # 


Syntax: 
EVCc PCO pog7" Ses eee 
LVvC LreG+57, Lmmediate 
Traps: 


trap_instruction 


Condition Code Modified: 


(none) 


Example: 


EWC 6g0, + 146 ! tt = 174 
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TVS 


Trap on Overflow Set 


Description: 


The TVS instruction causes a trap_instruction trap if V=1 and if no higher priority 
trap is pending. The trap_instruction trap causes the tt field of the Trap Base Reg- 
ister (TBR) to be written with 128 plus the least significant seven bits of either 
“r[rs1] + r[rs2]” if the i field is zero, or “r[rs1] + sign_ext(simm13)” if the 7 field is 
one. 

If V=0, a trap_instruction trap does not occur and the instruction behaves like a 
NOP. All bits indicated as reserved in the instruction formats should be supplied 
as zero as should the most significant 25 bits of r[rs2] if the 7 field is 0. 


(note: If single vector trapping is enabled, the trap_instruction trap will vector to 
the location pointed to by the Trap Base Address in the TBR, and the tt field will 
be ignored) 


Format: 


31 30 29 28 14.13 #12 5 4 0 


25 24 19 18 
oii [| ioe | wi —‘[-0] _resened —«Y:SC 


31 30 29 18 14,13 #12 7 6 0 


28 25 24 19 
waned] O11 [| __i11070 sofware Wap 4 


Syntax: 


tvs LEO Sate Leo pe9 
tvs reQ7s7,7 1mmediate 


Traps: 


trap_instruction 


Condition Code Modified: 


(none) 
Example: 
tvs Sg0 + 147 ! tt = 175 
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UMUL 


UMUL 





Description: 


UMUL performs either “r[rs1] x r[rs2]” if the i field is zero, or “r[rs1] x 
sign_ext(simm13)” if the i field is one. The 32 least significant bits of the product 
are written to the destination register r[rd]. The most significant bits of the prod- 
uct are written to the Y register. 


The UMUL operation takes 5 cycles to compute a 32 bit x word operation, 3 cycles 
to compute a 32 bit x halfword operation, and 2 cycles to compute a 32 bit x byte 
operation. To do this, the hardware tests the most significant 16, 24 or 32 bits of 
r[rs2] against the sign bit at run time. If the bits match, the UMUL instruction will 
terminate in 3, 2 or 1 cycle respectively. 


UMUL assumes an unsigned integer word operand and computes an unsigned 
integer doubleword product. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
oo10i0 = || Std iz unused (zero) rs2 


31 30 29 25 18 14 (13 +12 0 


24 19 
oor 


Syntax: 
umul YEGrsir TEGrs2r TeGrg 
umul LEGrsi, immediate, regryg 
Traps: 
(none) 


Condition Code Modified: 


(none) 

Example: 
umul $02, %03, S01! least significant half product to reg ol 
rd Sy, %00 ! most significant half product to reg o0 
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UMULcc UMULcc 
Signed Integer Multiply and Change Condition Codes 


Description: 


UMULecc performs either “r[rs1] x r[rs2]” if the 7 field is zero, or “r[rs1] x 
sign_ext(simm13)” if the 7 field is one. The 32 least significant bits of the product 
are written to the destination register r[rd]. The most significant bits of the prod- 
uct are written to the Y register. 


The UMULcc operation takes 5 cycles to compute a 32 bit x word operation, 3 
cycles to compute a 32 bit x halfword operation, and 2 cycles to compute a 32 bit x 
byte operation. To do this, the hardware tests the most significant 16, 24 or 32 bits 
of rlrs2] against the sign bit at run time. If the bits match, the UMULcc instruction 
will terminate in 3, 2 or 1 cycle respectively. 


UMUL«<cc assumes an unsigned integer word operand and computes an unsigned 
integer doubleword product. UMULcc writes the integer condition code bits (see 





below) 
Format: 
31 30 29 25 24 19 18 1413 12 54 0 
onom0 | si_—‘([-0] unused ero) [ve 
31 30 29 25 24 19 18 14 (13 #12 0 
077070 
Syntax: 
umuiec ECO gsq7- TSC yasy7. LED 4 
umulcc reQ,57, immediate, reg,, 
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Signed Integer Multiply and Change Condition Codes (Continued) 





Traps: 
(none) 
Condition Code Modified:. 


Picci [Utes 


Set if product [31] = 1 


Set if product [31:0] = 0 
Zero 
Zero 





Example: 
umulcc $02, %03, Sol! least significant half product to reg ol 
ra SY, 600 ! most significant half product to reg 00 
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WRASR WRASR 


Description: 


WRASR writes “r[rs1] xor r[rs2]” if the 7 field is zero, or “r[rs1] xor 
sign_ext(simm13)” if the 7 field is one, to the writable fields of the ASR register 
specified in rs1 (16-31). 


On the SPARClite MB86930 a valid rs1 value is 17. All other values of rs1 will gen- 
erate an illegal instruction trap. 


WRASR is a privileged instruction. 





Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
[oT a | 11000 «| mt —*(e0] unused (aro @ 
31 30 29 25 24 19 18 14. 13 #12 0 
Syntax: 
wr FCGrsir TFEGrs2r asYr YreQrg 
wr reg ei. UiimedLate;,. asr. req... 
Traps: 
illegal_instruction 
privileged_instruction 
Condition Code Modified: 
(none) 
Example: 
wr eOUy: Me. wersr lay ! enable single vector trapping 
wr SOU, Oy “casri7] ! disable single vector trapping 
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WRPSR 





Description: 


Wy 


WRPSR causes a delayed write of “r[rsi1] xor r[rs2] 





WRPSR 


if the i field is zero, or “r[rsi] 


xor sign_ext(simm13)” if the 7 field is one, to the writable fields of the PSR 


rs2 


register. 
WRPSR is a privileged instruction. See section 2.4.7 for programming consider- 
ations. 
Format: 
31.30 29 25 24 19 18 14 13 #12 5 
Pio [| reseed [ 170001 [ wi [HO] unused (eer) 
31 30 29 25 24 19 18 14 13 «#12 5 
| ee) 


Note: reserved fields should be programmed as 0. 


Syntax: 

wr LOG cir: LSGecsy) SPST 

wr re€G,-oj, immediate, %psr 
Traps: 


privileged_instruction 


Condition Code Modified: 


4 0 


(none) 
Example: 
wr ogu,; Uxec],). “sper te to pal, 2 tos & PS, 0 to eb, 7 to ewe 
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WRTBR WRTBR 





Description: 


WRTBR causes a delayed write of “r[rs1] xor r[rs2]” if the i field is zero, or “r[rs1] 
xor sign_ext(simm13)” if the i field is one, to the writable fields of the TBR 
register. 


WRPSR is a privileged instruction. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
110011 | st tO unused (zero) rs2 


31. 30 29 25 18 14°13 12 0 


24 19 
77007 


Note: reserved fields should be programmed as 0. 





Syntax: 

wr EEO 25,7" Te ceoy: Ser 

wr LeGye7,: mmedi ate, slbr 
Traps: 


privileged_instruction 


Condition Code Modified: 


(none) 


Example: 


wr 690, Ox1000, %tbr 
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WRWIM WRWIM 





Description: 


WRWIM causes a delayed write of “r[rs1] xor r[rs2]” if the i field is zero, or “r[rs1] 
xor sign_ext(simm13)” if the i field is one, to the writable fields of the WIM 
register. 


WRWIM is a privileged instruction. 


Formatr: 
31 30 29 25 24 19 18 14 #13 12 5 4 0 
110010 | rst [i= unused (zero) rs2 


31.30 29 25 18 14 13 #12 0 


24 19 
wooo et set] SSC mtSS—~d:S 


Note: reserved fields should be programmed as 0. 


Syntax: 

wr YEGrgir LCGrsor Swim 

wr reg;,s1, immediate, swim 
Traps: 


privileged_instruction 


Condition Code Modified: 


(none) 
Example: 
wr 6g0, -256, Swim ! only windows 0 to 7 valid 


! windows 8 and above invalid 
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WRY 


WRY writes “r[rs1] xor r[rs2]|” if the i field is zero, or “r[rs1] xor 
sign_ext(simm13)” if the 7 field is one, to the Y register. 


Unlike the other write state register instructions, WRY is not a privileged 


instruction. 


Format: 


31 30 18 14 #13 =«#12 


5 4 0 


29 25 24 19 
00000 110000 | rst f i= 0 | unused (zero) rs2 


31.30 29 14.13 #12 


25 24 19 18 


Note: reserved fields should be programmed as 0. 


Syntax: 

wr CEG eeqy. “FOO oe57-- SY 

wr reg,s51, immediate, %y 
Traps: 

(none) 


Condition Code Modified: 


(none) 
Example: 
wr G0, Oy oy ! clear reg y 


0 
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XNOR 


RSS wea tee DEAD RRINS AER OER ona eRe 





Description: 


Implements a bitwise logical exclusive Nor to compute either “r[rs1] xnor r[rs2]” 
if the 7 field is zero, or “r[rs1] xnor sign_ext(simm13)” if the 7 field is one, and 
places the result in the destination specified by the rd field. 


Format: 
31 30 29 25 24 19 18 14. 13 12 5 4 0 
PT a | ooo’ | et [nO] unused (zr) @ 
31 30 29 25 24 19 18 14 13 #12 0 
Syntax: 
xnor HOO fo47 PSG ear: LEO 
xnor reg,sg7, immediate, reg,, 
Traps: 
(none) 
Condition Code Modified: 
(none) 
Example: 
xnor oa en ee ! complement reg 11 
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XNORcc XNORcc 


7 





Description: 


Implements a bitwise logical exclusive Nor to compute either “r[rs1] xnor r[rs2]” 
if the i field is zero, or “r[rs1] xnor sign_ext(simm13)” if the i field is one, and 
places the result in the destination specified by the rd field. 


XNOKRcc modifies the integer condition codes. 


Format: 
31 30 29 25 24 19 18 14 #13 #12 5 4 0 
010111 | erst —[ i= | unused (zero) rs2 


31 30 29 18 14.13 «12 0 


25 24 19 





Syntax: 
xnoOrce LEGrsir TEGrsp2r TEGrg 
SHOrCce reg,o3, immediate, reg-g 
Traps: 
(none) 


Condition Code Modified: 


n, z=0,v, c=0 


Example: 
xnorcc Sd Shag SO ! do any bits in reg 11 match corresponding bits 
! in reg 12? 
bne eV Z ! skip ahead if not 
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XOR 


XOR 


gS 





Description: 


Implements a bitwise logical exclusive Or to compute either “r[rs1] xor r[rs2]” if 
the i field is zero, or “r[rs1] xor sign_ext(simm13)” if the 7 field is one, and places 
the result in the destination specified by the rd field. — 


Format: 
31 30 29 25 24 19 18 14.13 «12 5 4 0 
rd 000011 | rst f i= 0 unused (zero) rs2 


31 30 29 25 18 14 (13 #12 0 


24 19 
00007 


Syntax: 
xOr YEGrsir LEGrsor TeGrg 
xor YreGrsi7, immediate, reg,y 
Traps: 
(none) 


Condition Code Modified: 


(none) 
Example: 
XOr 6l1l, -1, G11 ! complement reg 11 
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XORcc 





Description: 


Implements a bitwise logical exclusive Or to compute either “r[rs1] xor r[rs2]” if 
the i field is zero, or “r[rs1] xor sign_ext(simm13)” if the 7 field is one, and places 
the result in the destination specified by the rd field. 


XORcc modifies the integer condition codes. 


Format: 
31 30 29 25 24 19 18 14 13 12 5 4 0 
010011 | si [i=] unused (zero) rs2 
31 30 29 25 24 19 18 14 13 12 0 


10071 


Syntax: 
xorcece YEGrs1r LEGrs2r TeGrg 
XOrec reg,.3, immediate, redg,, 
Traps: 
(none) 


Condition Code Modified: 


Nn 2=0;0,.C=0 


Example: 


XORCC $li, -1, S11 ! complement reg ll and test result 
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JTAG 


A.1 Introduction 


With the increased use of surface mount devices and the ever-increasing density i 
of printed circuit boards, traditional in-circuit and functional testing has become 
difficult and expensive. To reduce the complexity of board testing, a boundary- 

scan test technique has been adopted by the Joint Test Action Group (JTAG). 


The JTAG standard requires that a boundary-scan cell be between each compo- 
nent pin and the chip logic within an IC. On SPARClite a boundary-cell consists 
of at least one shift register bit and some multiplexing. All the boundary- scan 
cells within SPARClite are connected as one long shift register. This allows test 
access to the component pins. Components with JTAG can be connected serially 
on a board to provide test access to all the components plus access to the board 
traces. For more detailed information, consult IEEE Standard 1149.1. 
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_A.2 Test Access Ports (TAP) 


SPARClite has five dedicated pins for JTAG. 


| Yes Test Mode Select 


| We -. || Test Data Output 


No 
Yes 
Yes Test Data input 
Yes 





A.2.1 TCK 


JTAG uses a test clock independent of component-specific system clock. This is 
necessary to be able to shift the serial test data through components with different 
operating frequencies. An independent test clock allows shifting of test data con- 
currently with the system operation of the component and without changing the 
state of the on-chip system logic. Following are the JTAG requirements and clock 
specifications. 


1. The JTAG test logic state will remain unchanged indefinitely when TCK=0. 
2. A50% duty cycle clock is recommended. 


A.2.2 TMS 


The sequence of TMS inputs is used to put the JTAG test logic into a particular 
test mode. The test logic must be in the correct test mode to shift-in instructions, 
to do data-shifts and do other operations. 


1. TMS input is sampled by the test logic at the rising edge of TCK. 


2. Undriven TMS input appears as a logic “1” to the test logic. This is to ensure 
that the test logic will sequence to the Test_Logic_Reset state if the TMS is held 
high for at least five rising edges of TCK. The test logic will remain in the 
Test_Logic_Reset state as long as TMS=1. (See “Test Logic Reset” on 
page A-10.) 
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A.2.3 TDI 


The TDI pin is used to input test instructions and test data. 
1. The TDI input is sampled by the test logic at the rising edge of TCK. 
2. Undriven TDI input appears as a logic “1” to the test logic. 


3. No logic inversion takes place when data is being shifted from TDI towards 
TDO. 


4. TDl input change at the falling edge of TCK is recommended. 
A.2.4 TDO 


TDO is the serial output for the test instructions and data from the test logic. 
1. TDO output is valid after the falling edge of TCK. 


2. TDO output is in the high-impedance state when data or instruction is not 
scanned. 


A.2.5 -TRST 


~TRST is an asynchronous test logic reset pin. 


1. The test logic is forced into the Test_Logic_Reset state asynchronously when a 
logic “0” is applied to the -TRST pin. 





2. Ifit is not being driven, -TRST pin appears as a logic “1” to the test logic. This 
is to ensure normal test operation in the event of an unterminated —-TRST. 
3. —TRST does not initialize any system logic within the component. 


4. To ensure deterministic operation of the test logic, the TMS input should be 
held at 1 while the -TRST signal changes from 0 to 1. 


A.3 Test Instructions 


SPARClite implements the three JTAG public instructions; BYPASS, SAM- 
PLE/PRELOAD and EXTEST. 


SPARClite contains a two bit JTAG instruction register which receives the instruc- 
tion serially from the TDI input. The instruction bits are shifted-in at the rising 
edge of TCK. For fault isolation of the board level serial test data path, a constant 
binary “01” pattern is loaded into the instruction shift register at the start of the 
instruction-shift cycle. Therefore, a “01” pattern will appear at the TDO output in 
the beginning of the instruction-shift cycle. 


When shifting the instruction into the instruction register, the least significant bit 
of the instruction needs to be shifted in first, followed by the most significant bit. 


]JTAG - Test Instructions 
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A.3.1 BYPASS 


The BYPASS instruction is used to bypass a component that is connected in series 
with other components. This allows more rapid movement of test data through 
the components of the board, bypassing the ones that do not need to be tested. 
The BYPASS operation enables the bypass register, which is a single stage shift 
register, between TDI and TDO. 


1. 
Z 


The binary code for the BYPASS instruction is 11. 


The BYPASS instruction is forced into the instruction register output latches 
during the Test_Logic_Reset state. Note the distinction between the “01” con- 
tent of the instruction shift register and the “11” content of the instruction reg- 
ister output latch. Therefore, at the start of the instruction-shift cycle, a “01” 
pattern will be seen instead of “11”. 


The BYPASS operation does not interfere with the component operation at all. 
If the TDI input trace to the component is somehow disconnected, the test 
logic will see a “11” at TDI input during the instruction-shift state. Therefore, 
no unwanted interference with the on-chip system logic occurs. 


A.3.2 SAMPLE/PRELOAD 


The SAMPLE/PRELOAD instruction is used to sample the state of the compo- 
nent pins. The sampled values can be examined by shifting out the data through 
TDO. This instruction can also be used to preload the boundary-scan cell output 
latches with specific values. The preloaded values are then enabled to the output 
pins by the EXTEST. 


1. 
2. 


The binary code for the instruction is 01. 


The SAMPLE/PRELOAD instruction selects the boundary-scan cells to be 
connected between TDI and TDO in the Shift_ DR TAP controller state (see 
section A.4). 


The values of the component pins are sampled on the rising edge of TCK in 
the Capture_DR TAP controller state. 


The preload values shifted into the boundary-scan cells are latched into the 
boundary-scan output latch at the falling edge of TCK in the Update_DR TAP 
controller state. 
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A.3.3 EXTEST 


EXTEST instruction allows testing of off-chip circuitry and board level intercon- 
nections. The PRELOAD/SAMPLE instruction is used to preload the data into the 
latched parallel outputs of the boundary-scan shift register stages. Then, the 
EXTEST instruction enables the preloaded values to the components output pins. 


1. The binary code for the instruction is 00. 

2. SPARClite outputs the preloaded data to the pins at the falling edge of TCK in 
the Update_IR TAP controller state at which point the JTAG instruction regis- 
ter is updated with the EXTEST. 

3. The EXTEST instruction selects the boundary-scan cells to be connected 
between TDI and TDO in the Shift_DR test logic controller state. 


4. Once the EXTEST instruction is effective, the output pins can change at the 
falling edge of TCK in the Update_DR TAP controller state. 


A.3.4 JTAG Cells 


SPARClite's JTAG test data scan path is composed of input cells, output cells, I/O 
cells and output cells with set control. The basic structures of the cells are shown 
in the accompanying figures. As the name implies, the input cell is used for input- 
only pins and the output cell is used for output-only pins. The I/O cell is used for 
the I/O pins and the output cell with set control is used for I/O buffer control. 


With each group of I/O pins there is an I/O buffer control JTAG cell which is 
used to control the direction of the I/O pins during EXTEST operation. This 
implies that within the data-scan path there are cells which do not correspond to a 
pin, but are used for I/O buffer control during EXTEST operation. 


Note that the output cell and the I/O cell have an output latch separate from the 
shift register. This allows the output to remain unchanged during a data-shift 
operation during the EXTEST mode. The cell output latches are updated during 
the Update_DR state (see section A.4). 


A.3.5 Input Cell 


For SPARClite, an input cell structure with signal capture only capability has 
been chosen to minimize the propagation delay from the input pins to the on-chip 
system logic. Using the SAMPLE/PRELOAD instruction, the user can sample the 
input pin and scan out the sampled value. 
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A.3.6 Output Cell 


The output cell has the capability to output a preloaded value to the output pin 
during EXTEST. During EXTEST, the source of the output changes from the chip 
logic to the output latch of the JTAG output cell. The output value in the cell is 
preloaded using the SAMPLE/PRELOAD instruction. 


A.3.7 I/O Cell 


The I/O cell is actually composed of an input cell and an output cell. Therefore, 
for each I/O pin there are two cells associated with the pin. Hence, when the data 
is shifted out through TDO, two bits for each I/O pin will be seen. As mentioned 
previously, an I/O buffer control cell is associated with each group of I/O pins. 
For example, the 32-bit data bus is controlled by the data I/O buffer control cell. 
The I/O buffer control cell is also in the data scan path through which the user 
can control the direction of the I/O buffer for the EXTEST. 


A.3.8 Output Cell with Set 


This cell is used as the I/O buffer control cell. The output latch of the cell is set 
during Test_Logic_Reset state so that if EXTEST is entered after reset, the I/O 
pins are in the input mode. There is one I/O buffer control cell for each group of 
I/O signals. 


| I/O buffer control cell name | /O pins | 


emudiojo EMU D<3:0>, EMU_SD<3:0> 
emuenblio —EMU_ENB 
dbusiojo D<31:0> 

tstatejo Output Pinst 


















tT. Not all output pins are three-statable 


To Next Cell 


From System Pin To Output Pin 





Shift[DR § From ClockDR 
Last 
Cell 


Figure A-1. Input Cell Allowing Signal Capture Only 
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Mode ShiftDR To Next Cell 


Data From 
internal Logic 


From Last Cell ClockKDR UpdateDR 


Figure A-2. Output Cell 


ShiftDR To Next Cell Mode 


Output Control 
From Internal 
Logic 


From Last Cell ClockDR UpdateDR _ set 


Figure A-3. Output Cell with Set 


To Next Cell 


Output Enable 


To/From 


Internal Logic Input Data 


Input Cell 


Output Data Output Cell 


From Last Cell 


Figure A-4. |/O Structure 
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System Pin 


To Output 
Enable 


System Pin 
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A.4 Operation 


The JTAG control logic, which is also referred to as the TAP controller, is imple- 
mented with a synchronous finite state machine. The asynchronous reset input 
(-TRST) and the TMS input control the state transition of the TAP controller. To 
shift instructions into the instruction register and to do test data-scans, the TAP 
controller needs to be in the appropriate state (see Figure A-5 and Figure A-6 for 
timing relationship). A TAP state transition diagram is provided with examples 
in the following pages. 


The usual sequence of operations is as follows. Initially, the TAP controller is 
forced into the reset state, Test_Logic_Reset, by -TRST=0. Next, TMS is set to a 
“1” and the -TRST is deasserted at the falling edge of TCK. At the next rising edge 
of TCK, the TMS=1 value is sampled by the test logic and the TAP controller 
remains in the reset state. The first thing that needs to be done is to shift in the 2 
bit instruction into the JTAG instruction register. 
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Figure A-5. Test Logic Operation: Instruction Scan 
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To do so, the TAP controller needs to be transitioned to the Shift_IR state. In order 
to make the state transition from Test_Logic_Reset to Shift_IR state, the correct 
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TMS sequence would have to be 0 -> 1 -> 1 -> 0 -> 0. Remember that the TMS 
input should change at the falling edge of TCK so that enough setup time is avail- 
able with respect to the rising edge of TCK at which point the TMS input is sam- 
pled. The TAP controller changes state at the rising edge of TCK. Once in the 
Shift_IR state, the instruction bits at TDI will be shifted into the JTAG instruction 
register at the rising edge of TCK. Suppose the instruction shifted in was a SAM- 
PLE/PRELOAD. Then as soon as the instruction is shifted in, the TAP controller 
must transition to the Exit] IR state to terminate the instruction-scan. Otherwise, 
more than 2 bits will be shifted into the instruction register. 


For the SAMPLE/PRELOAD instruction, data shifts need to take place either to 
output the sampled value of the pins or to shift in the preload value for EXTEST. 
Therefore, the TAP controller needs to change state from Exit1_IR to the Shift_DR 
state. This is accomplished by giving the 1 -> 0 -> 1 -> 0 -> 0 TMS sequence. Once, 
in the Shift_DR state, the TDI input will be scanned into the shift register portion 
of the boundary scan cells at the rising edge of TCK. Once data-scan is finished, 
the TAP controller state can be transitioned to the Run_ Test/Idle state for the next 
JTAG instruction. 
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A.5 The TAP Controller 


A.5.1 TAP Controller State Diagram 
Specifications 


Rules 


1. The state diagram for the TAP controller is shown in Figure A-7. (Note the 
value shown adjacent to each state transition arc in this figure represents the 
signal present at TMS at the time of a rising edge at TCK.) 


2. Allstate transition of the TAP controller must occur based on the value of TMS 
at the time of a rising edge of TCK. 


3. Actions of the test logic occur on either the rising or the falling edge of TCK in 
each controller state. 


Description 


The behavior of the TAP controller and other test logic in each of the controller 
states is briefly described as follows. Note the term, Test Data Registers, refers to 
either the Bypass Register or the 152 JTAG cells connected as a shift register. 


Test Logic Reset 


The test logic is disabled so that normal operation of the on-chip system logic (ie., 
in response to stimuli received through the system pins only) can continue unhin- 
dered. This is achieved by initializing the instruction register with the BYPASS 
instruction. No matter what the original state of the controller may be, the con- 
troller will enter Test-Logic-Reset when the TMS input is held high for at least five 
rising edges of TCK. The controller remains in this state while TMS is high. 


If the controller should leave the Test-Logic-Reset controller state as a result of an 
erroneous low signal on the TMS line at the time of a rising edge on TCK (for 
example, a glitch due to external interference), it will return to the Test-Logic- 
Reset state following three rising edges of TCK with the TMS line at the intended 
high logic level. The operation of the test logic is such that no disturbance is 
caused to on-chip system logic operation as the result of such an error. On leaving 
the Test-Logic-Reset controller state, the controller moves into the Run-Test/Idle 
controller state where no action will occur because the current instruction has 
been set to select operation of the bypass register. The test logic is also inactive in 
the Select-DR-Scan and Select-IR-Scan controller states. 


Note that the TAP controller will also be forced to the Test-Logic-Reset controller 
state by applying a low logic level to the TRST™ input. 
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Run-Test/Idle 


A controller state between scan operations. In the Run-Test/Idle controller state, 
activity in selected test logic occurs only when certain instructions are present. 


For instructions which do not cause functions to execute in the Run-Test/Idle 
controller state, all test data registers selected by the current instruction must 
retain their previous state (i.e., Idle). 


The instruction does not change while TAP controller is in this state. 


Select-DR-Scan 


This is a temporary controller state in which all test data registers selected by the 
current instruction retain their previous state. 


If TMS is held low and a rising edge is applied to TCK when the controller is in 
this state, then the controller moves into the Capture-DR state and a scan 
sequence for the selected test data register is initiated. If TMS is held high and a 
rising edge is applied to TCK the controller moves on to the Select-IR-Scan state. 


The instruction does not change while the TAP controller is in this state. 


Select-IR-Scan 


This is a temporary controller state in which all test data registers selected by the 
current instruction retain their previous state. 





If TMS is held low and a rising edge is applied to TCK when the controller is in 
this state, then the controller moves into the Capture-IR state and a scan sequence 
for the instruction register is initiated. If TMS is held high and a rising edge is 
applied to TCK the controller returns to the Test-Logic-Reset state. 


The instruction does not change while TAP controller is in this state. 


Capture-DR 


In this controller state data may be parallel loaded into test data registers selected 
by the current instruction on the rising edge of TCK. 


The instruction does not change while TAP controller is in this state. 


Shiff-DR 


In this controller state, the test data register connected between TDI and TDO as a 
result of the current instruction shifts data one stage towards its serial output on 
each rising edge of TCK. 


The instruction does not change while the TAP controller is in this state. 
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Exitl-DR 


This is a temporary controller state. If TMS is held high, a rising edge applied to 
TCK while in this state causes the controller to enter the Update-DR state, which 
terminates the scanning process. If TMS is held low and a rising edge is applied to 
TCK, the controller enters the Pause-DR state. 


All test data registers selected by the current instruction retain their previous state 
unchanged. 


The instruction does not change while TAP controller is in this state. 


Pause-DR 


This controller state allows shifting of the test data register in the serial path 
between TDI and TDO to be temporarily halted. All test data registers selected by 
the current instruction retain their previous state unchanged. 


The instruction does not change while TAP controller is in this state. 


Exit2-DR 


This is a temporary controller state. If TMS is held high and a rising edge is 
applied to TCK while in this state, the scanning process terminates and the TAP 
controller enters the Update-DR controller state. If TMS is held low and a rising 
edge is applied to TCK, the controller enters the Shift-DR state. 


All test data register selected by the current instruction retain their previous state 
unchanged. 


The instruction does not change while the TAP controller is in this state. 


Update-DR 


Some test data registers are provided with a latched parallel output to prevent 
changes at the parallel output while data is shifted in the associated shift-register 
path in response to certain instruction (e.g., EXTEST). Data is latched onto the 
parallel output of these test data register from the shift-register path on the falling 
edge of TCK in the Update-DR controller state. The data held at the latched paral- 
lel output should not change other than in this controller state. 


All shift-register stages in test data registers selected by the current instruction 
retain their previous state unchanged. 


The instruction does not change while the TAP controller is in this state. 
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Capture-IR 


In this controller state the shift-register contained in the instruction register loads 
a pattern of fixed logic values on the rising edge of TCK. 


Test data registers selected by the current instruction retain their previous state. 
The instruction does not change while the TAP controller is in this state. 


Shift-IR 


In this controller state the shift-register contained in the instruction register is con- 
nected between TDI and TDO and shifts data one stage towards its serial output 
on each rising edge of TCK. 


Test data register selected by the current instruction retain their previous state. 
This instruction does not change while the TAP controller is in this state. 


Exit]-IR 


This is a temporary controller state. If TMS is held high, a rising edge applied to 
TCK while in this state causes the controller to enter the Update-IR state, which 
terminates the scanning process. If TMS is held low and a rising edge is applied to 
TCK, the controller enters the Pause-IR state. 


Test data registers selected by the current instructions retain their previous state. 
The instruction does not change while the TAP controller is in this state and the 
instruction register retains its state. 


Pause-IR 


This controller state allows shifting of the instruction register to be temporarily 
halted. 


Test data registers selected by the current instruction retain their previous state. 
The instruction does not change while the TAP controller is in this state and the 
instruction register retains its state. 


Exit2-IR 


This is temporary controller state. If TMS is held high and a rising edge is applied 
to TCK while in this state causes termination of the scanning process and the TAP 
controller enters the Update-IR controller state. If TMS is held low and a rising 
edge is applied to TCK the controller enters the Shift-IR state. 


Test data registers selected by the current instruction retain their previous state. 
The instruction does not change while the TAP controller is in this state and the 
instruction register retains its state. 
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Update-IR 


The instruction shifted into the instruction register is latched onto the parallel out- 
put form the shift-register path on the falling edge of TCK in this controller state. 
Once the new instruction has been latched it becomes the current instruction. 


Test data registers selected by the current instruction retain their previous state. 


The Pause-DR and Pause-IR controller states are included so that shifting of data 
through the test data or instruction register can be temporarily halted. For exam- 
ple, this might be necessary in order to allow an ATE system to reload its pin 
memory from disc during application of a long test sequence. 
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Figure A-7. TAP Controller State Diagram 
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1.6 JTAG Pin List 


The JTAG cells are arranged in a shift register configuration (see Figure A-8). 
When shifting in a JTAG pattern through TDI, the LSB should correspond to the 
JTAG cell value for -TIMER_OVF pin whereas, the MSB of the pattern should cor- 
respond to the CLK_ENB pin's JTAG cell. As far as JTAG output through TDO is 
concerned, the first bit out corresponds to -TIMER_OVF JTAG cell value and the 
last output bit corresponds to the CLK_ENB JTAG cell value. Table A-1 lists the 


Table A-1: JTAG Pin Order 





JTAG Cell 
Type 





















JTAG Cell 


—TIMER_OVF output 


order of all of the JTAG cells. 


Timer Overflow pin 
XTAL1 input Crystal input 
EMU_BRK input Emulator break input 


4 icediojo! output EMU_D bus bidirectional control signal 
emudiojo = 0: EMU_D bus is output 
19 












emudiojo = 1: EMU_D bus is input 
EMU_D_i<7> input Input bit 7 of EMU_D<7:0> bus 
EMU_D_o<7> output Output bit 7 of EMU_D<7:0> bus 





EMU_D i<0> Input bit 0 of EMU D<7:0> bus 


EMU_D_o<0> output Output bit 0 of EMU_D<7:0> bus 


output —EMU_ENB bus bidirectional control signal 


emuenblio = 1: -EMU_ENB bus is an input 
input Input bit of -EMU_ENB pin 


emuenblio = 0: -EMU_ENB bus is an output 
—EMU_ENB_o output Output bit of -EMU_ENB pin 


— 


D_i<31> input Input bit 31 of D<31:0> bus 
output Output bit 31 of <31:0> bus 


ms 
30 





D<31:0> bus bidirectional control signal 
dbusiojo = 1: D<31:0> bus is an input 
dbusiojo = 0: D<31:0> bus is an output 










—RESET input Chip reset pin 


output Output bit 0 of <31:0> bus 
—MEXC input Memory exception input 
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Table A-1: JTAG Pin Order 





JTAG Cell 
Type 
input 






JTAG Cell Function 


—READY 


tstatejo! 


92 
93 


External memory transaction complete signal 





output 


Three-state control signal 
If tstatejo=1 then the following pins are three-stated. 


ADR<31:2>, ASI<7:0>, -BE<3:0>, -AS, RD/WR, 
—LOCK 


output MSB of chip select output signal 
output Same-Page output signal 
output Byte 3 enable output signal 


output Byte 0 enable output signal 7 
output LSB of ASI output pins 


output MSB of ASI output pins 
output LSB of Address output pins 


output MSB of Address output pins 
input MSB of interrupt request pin 


LSB of address output pins 


PLL control pin. 
CLK_ENB=1: PLL on 
CLK_ENB=0: PLL off 





94 
95 


—BGRNT 
—ERROR 
—LOCK 

97 —RD/WR 





—CS<0> 



















104 
105 
106 


—CS<5> 
—SAME_PAGE 
—BE<3> 





109 
110 


—BE<0> 
AS1!<0> 


117 
118 


AS|<7> 
ADR<2> 


147 
148 


ADR<31> 
IRL<3> 


151 
152 


IRL<0> 
CLK_ENB 





input 






| 
ep) 





t. These are internal I/O control signals. Therefore, there are no corresponding external pins. 
1. The following pins are not three-statable: -SAME_PAGE, —CS<5:0>, -BGRNT, TIMER_OVF, -ERROR. 
2. The following pins have no corresponding JTAG cells: CLKOUT1, CLKOUT2, XTAL2, -TRST, TCK, TMS, TDI, TDO. 
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Cache Update Policy, 1-13, 2-47 
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Multiply Routines Using the MULScc Instruction, 5-43 
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